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Preface 


Program synthesis is the systematic, automatic construction of efficient executable 
code from high-level declarative specifications. AutoBayes is a fully automatic pro- 
gram synthesis system for the statistical data analysis domain; in particular, it solves 
parameter estimation problems, ft has seen many successful applications at NASA 
and is currently being used, for example, to analyze simulation results for Orion. The 
input to AutoBayes is a concise description of a data analysis problem composed 
of 1) a parameterized statistical model and 2) a goal that is a probability term in- 
volving parameters and input data. The output of AutoBayes is optimized and 
fully-documented C/C++ code that, given input data, computes values for those pa- 
rameters that maximize the probability term. Parameter estimation, clustering, and 
change point detection type statistical analysis problems can be described in this fash- 
ion. The output code can be linked dynamically into MATLAB™ 1 , Octave, and 
other environments. AutoBayes uses Bayesian Networks internally to decompose 
complex statistical models and to derive algorithms for their solution. Its powerful 
symbolic system enables AutoBayes to solve many subproblems symbolically rather 
than having to rely on numeric approximation algorithms, thus yielding effective, ef- 
ficient, and compact code. AutoBayes makes statistical analysis faster and more 
reliable, because effort can be focused on model development and validation rather 
than manual development of solution algorithms and code, which instead AutoBayes 
handles automatically. 


1 Matlab™ is a trademark of Mathworks, Inc. 
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1. Introduction 


Data analysis is the transformation of raw data (i.e., pure numbers) into a more 
abstract form, e.g., summarizing a set of measurements by their mean value and 
standard deviation as a bare minimum. For most data analysis tasks — especially tasks 
involving large data sets — computer support is necessary. AutoBayes is a program 
generator that generates statistical-method based scientific data analysis programs. 

The input to AutoBayes is a problem specification as you can see in Figure 1.1. 
This problem specification is a concise description of a data analysis problem; in 
fact, a parameter estimation problem. The specification is in the form of a statistical 
model with declarations, constraints, and a maximum-likelihood (ML) or maximum a 
posteriori (MAP) goal. AutoBayes internally uses Bayesian Networks to decompose 
complex statistical models and to derive algorithms for the solution. AutoBayes’s 
output is optimized and fully documented C/C++ code which can then be linked 
dynamically into Matlab™, Octave, or other environments. Input data is then 
given to the generated program to obtain analysis results in the form of estimated 
parameters. 

1.1 Key Features 

AutoBayes offers several unique features which result from using program synthesis 
technology and which make it more powerful and more versatile for statistical analysis 
than other tools and statistical libraries. 

• AutoBayes generates efficient procedural code from a high-level, declarative 
specification; the specification does not involve algorithmic information such as 
data or control flow. 

• Changes to the statistical model can be made without time consuming re- 
implementation of a data analysis program. 

• The automatically-generated code that solves the statistical problem is efficient 
and accurate, because it does not rely on numeric approximations when it can 
solve problems symbolically. 

• AutoBayes can generate different programs for the same application, each 
embodying differing solution algorithms or algorithm variants. 
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Figure 1.1: AutoBayes system architecture 


• With AutoBayes’s test data generator, the user is able to generate synthetic 
data according to the given model specification. This feature can be used to 
debug and validate the statistical model and to select a synthesized program 
which best fits the given application profile. 

The rest of this chapter discusses some applications that AutoBayes has been used 
for, giving a sense of its capabilities and breadth of applications. Chapter 2 describes 
how to install AutoBayes. Chapter 3 shows the step-by-step application of Auto- 
Bayes to solve the classic problem of clustering Fisher data about Iris flowers. This 
chapter explains the clustering problem specification, the commands to invoke Auto- 
Bayes and link it to Matlab™, and the results of the analysis. Chapter 4 briefly 
describes how AutoBayes works internally; the following chapter describes the kinds 
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of algorithms (e.g., clustering and maximization algorithms) that AutoBayes can 
embed in its generated code. Chapter 6 describes AutoBayes commands, hies, and 
hags. Chapter 7 defines the problem specihcation language. We have found that 
a specihcation for a new problem is most easily constructed by basing it on existing 
specihcations, so chapter 8 discusses many problems and their specihcations, as well as 
descriptions of the derivation of algorithms constructed by AutoBayes to solve those 
problems. Appendix A describes command line options accepted by AutoBayes that 
affect its behavior. Appendix B acknowledges the creators of AutoBayes and NASA 
support. Lastly, there is a bibliography and an index of terms. 

1.2 Applications of AutoBayes 

The AutoBayes system has been successfully applied to a wide held of different 
projects and tasks within NASA. In this section, we will briefly highlight some of the 
applications in order to give a glimpse of the system’s capabilities and applicabilities. 

1.2.1 Data Analysis on Large Software Simulations 

The analysis of large and complex parameterized software systems, e.g., understand- 
ing the results of simulations of aerospace systems, is very complicated and time- 
consuming due to the large parameter space and the complex, highly-coupled non- 
linear nature of the different system components. At NASA Ames, tools were devel- 
oped to facilitate the systematic exploration of parameter spaces. The tools use a 
unique combination of Monte Carlo, n-factor combinatorial parameter variations, and 
model-based generation of simulation and test cases [GBSMB08]. These cases (of- 
ten up to 10,000 of them) are executed by the Trick simulation environment [Vet07], 
usually generating huge amounts of data. 

In order to study the structure of these multivariate data and the dependency on the 
parameters, AutoBayes clustering models are being used to group these large data 
sets. A synergistic combination with the machine learning tool TAR3 then facilitates 
comprehensive root cause analysis. This tool is being used for abort and re-entry 
scenario analysis for Orion, as well as for a small-satellite guidance, navigation, and 
control system. Figure 1.2 shows results of a computational model of an earth-based 
small-satellite simulator entitled the “Hover Test Vehicle” (HTV). The first figure 
shows the expected variation in trajectory relative to various input parameters (such 
as center of gravity, mass of fuel, etc.). The clusters were ranked according to their 
landing velocity and position, with colors ranging from Blue (best outcomes) to Red 
(worst outcomes) The second plot shows the relationship between landing velocity and 
the initial wet mass of the vehicle. Through the use of this data, the flight rules for 



18 


Introduction 


a successful flight were defined. The flight test of the vehicle performed as suggested 
by the data. 



Figure 1.2: HTV trajectories colored according to their class (left). Relationship 
between landing velocity and wet mass (right). 


1.2.2 Data Analysis for Air Traffic Control Data 

Modern air traffic control (ATC) systems have to deal with a densely crowded airspace. 
In order to avoid conflicts and mid-air collisions, air traffic control systems, such as 
the CTAS (Center Tracon Advisory System), which has been developed at NASA 
Ames, include algorithms to predict the aircraft’s trajectories for several minutes into 
the future. Of course, the models underlying these algorithms must be as accurate as 
possible, despite many unknowns and high noise. 

AutoBayes was used for a study to perform data mining on actual aircraft trajec- 
tories (data from ATC and radar). One task was to find the transition point between 
a constant CAS (calibrated airspeed) and a constant mach climb, which occurs in 
typical trajectories. Figure 1.3 illustrates such a scenario. The altitude of the aircraft 
over time is shown as well as the development of CAS and mach speed. Given the 
CAS and mach speed over time the task is to find the most likely transition point 
(vertical line). A simple AutoBayes specification solves this task (for details, see 
Section 8.3.3). We assume linear, but noisy behavior of the speeds. Thus, the CAS 
trajectory can be defined by 

_ j caso for t <t 0 

CaSt y case — cas r (t — t 0 ) for t > t 0 

with unknown parameters cas o, cas r , and the transition point to- The same can be set 
up for the mach number. If we assume that the actual data are noisy and Gaussian 
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distributed, it is easy to write a statistical specification to estimate the unknown 
parameters. Figure 1.4 shows the actual result of the AutoBayes analysis. The code 
generated by AutoBayes has reliably estimated the unknown parameters and the 
transition point (blue and green). For this analysis, large data sets consisting of more 
than 10,000 climb scenarios have been analyzed, resulting in figures like Figure 1.5 
which shows the most likely mach numbers and altitudes of the transition for a specific 
type of jet aircraft. 



Figure 1.3: Aircraft climb trajectory with altitude profile (solid), CAS (dashed), and 
mach profile (dash-dot) 

1.2.3 Shape Analysis of Planetary Nebulae 

Planetary nebulae are remnants of dying stars. Scientists try to understand the 
physics of these nebulae by analyzing data, in particular by analyzing images taken 
by the Hubble Space Telescope (HST). Figure 1.6a shows the image of planetary neb- 
ula IC418 (the “Spirograph” Nebula). Although coming in most different shapes and 
forms, different regions of the nebula can be distinguished easily: the dwarf star in the 
center, the core, and the outer hull. Automatic analysis of pictures requires statistical 
data analysis models that estimate the center and elliptical extent and orientation 
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Figure 1.4: Results of analysis with AutoBayes for three climb scenarios. The actual 
trajectory is shown in red, blue lines correspond to the parameters as estimated by 
AutoBayes code 



Figure 1.5: Likelihood of transition point in an altitude over mach coordinate system 
(left) and in a CAS over mach coordinate system (right). Two major transition points 
at altitudes of approximately 26,000ft and 31,000ft are detected. 

of the nebula. In close collaboration with scientists at NASA Ames, several Auto- 
Bayes models were defined to estimate center and extent of a nebula. Figure 1.6b 
shows the manually masked image, which has been taken as the basis for the analysis. 
Following the approach in [KH02] of using a hierarchy of statistical models to estimate 
the nebula’s parameters, a first model uses a 2-dimensional spherical Gaussian model 
to estimate center (x 0 ,y 0 ), intensity i 0 , and radius r of the nebula, where intensity F 
is modeled as 

(x’ U -a :) 2 + (y 0 -y ) 2 

F(x,y) = io ■ e 2r 2 (1.1) 

In the statistical model, the data (pixel intensity) d(x,y ) is distributed as d(x,y ) ~ 
F(x, y) + rj with white noise rj. 
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Figure 1.6c shows the original image superimposed with the estimates of the param- 
eters of the Gaussian model. Figure 1.6d shows sampled data generated using the 
estimated Gaussian model parameters. 



Figure 1.6: Planetary nebula IC 418 or Spirograph Nebula (a) Composite false-color 
image taken by the HST (Sahai et ah, NASA and The Hubble Heritage Team). The 
different colors (resp. gray-scales) indicate the different chemicals prevalent in the 
different regions of the nebula; the origin of the visible texture is still unknown. The 
central white dwarf is discernible as a white dot in the center of the nebula, (b) 
Manually masked original image, (c) Original image with estimated Gaussian model 
parameters superimposed, (d) Sample data generated using estimated Gaussian model 
parameters. 

In a second set of models, image segmentation techniques were used. Here, image 
segmentation via clustering was used. The AutoBayes model, developed for this 
application is a two-dimensional mixture (of Gaussians) model. AutoBayes gen- 
erates a customized EM algorithm. Figure 1.7 shows results using clustering-based 
image segmentation techniques. Models without and with geometric refinement, i.e. , 
constraints about the shape are shown. Note that these AutoBayes models are 
insensitive enough against several image artifacts (e.g., the “spokes”). 

Results of this studies were published in [FHKS03, FS03a, KH02], 



Figure 1.7: Segmentations of IC418 image: (a) three-class segmentation (b) ditto, 
white class used as mask load to original image (c) five-class segmentation, and (d) 
sample data from geometrically refined segmentation model; class 1 (white) corre- 
sponds to the hull, class 2 (gray) to the core, and class 3 (not shown) to the back- 
ground. 
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1.2.4 Clustering for Sloan Digital Galaxy Survey 

For understanding the large scale structure of the universe, mapping of structures of 
all scales (galaxies to large walls) is important. The Sloan Digital Sky Survey (SDSS) 
contains photometric images including accurate redshifts. The classification of galaxy 
distances based on redshift using advanced Bayesian data analysis techniques has 
been explored at NASA Ames. [SSF05] describes an ensemble approach to building 
Mercer Kernels with prior information. These data adaptive kernels can encode prior 
information and have been learned directly. For this research work, AutoBayes 
was used to estimate the parameters for the kernels. A very compact AutoBayes 
specification (a mixture model with priors) produces an efficient customized variant 
of the EM algorithm. [SSF05] describes the approach; the mathematical derivation in 
this paper has been automatically generated and typeset by AutoBayes. Figure 1.8 
(left) shows the clustering of a small section of the sky using spectroscopically de- 
termined redshifts. Dots indicate galaxies on filaments, crosses indicate field-galaxies 
(not in filaments). The log-likelihood of the AutoBayes Gaussian mixture model 
(without priors here) is shown in Figure 1.8 (right) as a function of number of com- 
ponents in the model. This diagram shows that there is substantial variation due to 
the well-known sensitivity of the EM algorithm to its (random) initialization. 



Figure 1.8: Clusters of galaxies (left) and log- likelihood of the AutoBayes model as 
a function of model components (number of classes, right). 


1.2.5 Hyperspectral Clustering of Earth Science Data 

Earth-observing satellites like MODIS have a hyperspectral camera. This means that 
a certain image is taken with a multitude of different wavelengths in the infrared 
spectrum. The resulting data block is called a hyperspectral data cube (Figure 1.9, 
left). Since different ground cover (e.g., trees, grass, buildings, water) and different 
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minerals (granite, sand, limestone) reflect the light with a different intensity for dif- 
ferent wavelength, scientists can use these data to analyze ground coverage, detect 
wild fires, or find dying trees. 

A simple multivariate mixture model developed with AutoBayes can easily cluster 
such a hyperspectral cube. The resulting most probable class assignment for each 
pixel has been used to color the image according to the estimated groups (Figure 1.9, 
right). The various different ground covers (grass, water, marshland, etc) can be 
seen easily. In this case, AutoBayes was asked to produce 5 different classes. This 
clustering gives an easy to interpret result (Figure 1.9 (right)). With a larger number 
of classes more subtle variations are uncovered. 



Figure 1.9: Hyperspectral image cube (MODIS) (left) and clustering result for hyper- 
spectral data and 5 classes as produced by AutoBayes (right). 


1.2.6 Clustering and Mapping of Geospatial Data 

Census data are in general multivariate data. For each household, different variables, 
like size of household, income, renting or owning are assembled. These data are related 
to the ZIP code. With a simple AutoBayes multivariate clustering model, such data 
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can be processed and visualized as shown in Figure 1.10. AutoBayes can be easily 
interfaced into visualization tools such as NASA’s World Wind 1 or Google Earth. 
Figure 1.10 shows the results of a multivariate clustering of US census data along the 
dimensions household size, rented/owned living quarters, age, and male-female ratio 
for kids and adults. In the given coding, a total of 11 variables were used. These data 
were available with the ZIP code as their index. Figure 1.10 reveals pretty distinct 
classes for the region around Anchorage, the North-Western part, and the Aleutes 
and can form the basis for further analyses and interpretation. 



100 200 300 400 500 600 700 800 900 1000 1100 


Figure 1.10: Multivariate clustering of census data and mapping onto geographical 
centers of ZIP codes in parts of Alaska. 


1.2.7 Detection of Gamma-ray Spikes 

Gamma-ray bursts are very powerful electromagnetic flashes of gamma rays. It is 
thought that these bursts have to do with the collapse of a high-mass star into a black 
hole. Instruments in orbit, e.g., the BATSE (Burst and Transient Source Explorer) 
on the Compton Gamma Ray Observatory (launched 1991) continuously measure 


1 http : / / wor ldwind . ar c . nasa . gov 
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the intensity of gamma rays. A simple AutoBayes model can be used to detect 
and isolate such burst events. We assume that the inter-arrival time of photons is 
exponentially distributed. A detector for a switchpoint, i.e., the transient between 
a (low) arrival rate and a higher arrival rate can be specified in the AutoBayes 
specification language in a few lines. The generated program successfully isolates 
recognized bursts. 
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2. Installation 


AutoBayes runs under Linux and is implemented in SWI-Prolog. The prerequisites 
for AutoBayes are as follows: 

• CPP (C Preprocessor) 

• SWI Prolog 

• DOT Language (For graphical representation of Bayesian Networks) 

• Matlab™ or Octave 

2.1 Hardware Requirements 

• Solaris 

• Mac OSX 

• Linux 

• Windows with cygwin 

2.2 Installation Requirements 

2.2.1 C Preprocessor 

CPP is the preprocessor for the C programming language and is a requirement for 
AutoBayes. It is included, e.g., with the GNU C-compiler and is by default available 
for most platforms. 

2.2.2 Installing SWI-Prolog 


The stable version (Version 5.6.x or higher) of SWI-Prolog can be downloaded from 
http : //www. swi-prolog. org/. 
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2.2.3 GraphViz 

The GraphViz tool is used for generating layouts of the graphical representations of 
Bayesian Networks. Please refer to the Graphviz website http : //www . graphviz . org/, 
and click on the Download tab and follow the instructions in order to have this package 
installed on your system. Only the tool dot is used with the AutoBayes system. 

2.2.4 Matlab™ or Octave 

Matlab™ or Octave is needed to compile and run the generated synthesized code. 
Matlab™ is available at http : //www . raathworks . com and Octave can be obtained 
from http : //www . octave . org. OCTAVE is an open source program system for scien- 
tific computation which is very similar to MATLAB™. 

2.3 Getting AutoBayes 

AutoBayes can be obtained through two methods: by the source tar-file, or by 
checking out from a CVS repository. Currently, the CVS repository can only be 
accessed from NASA Ames, Code TI machines. 

2.3.1 Source TAR File 

You can unpack the AutoBayes source tar-file autobayesA'. Y. tgz, where X.Y is 
the version number, in the Unix environment by using the following command: 

% tar zxvf autobayesA. Y. tgz 

or the gtar command: 

% gtar zxvf autobayesA. Y. tgz 

Note: Unpacking a tar-file will write its contents to the current directory. 

In the sequel, the directory where AutoBayes was unpacked will be called 
autobay eshome. 

2.3.2 AutoBayes CVS Repository 

You can use CVS to checkout copies of projects if you have permission for CVS user 
access. CVS is a good method for keeping track of modification made to project 
source hies. 


Obtain the AutoBayes tool through a CVS checkout from the top-level directory 
using the command line: 
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cvs -d :pserver : usemame&projectname. domain, net: /cvs co projectname/top-level- 
directory 

For example: 

cvs -d wow . arc . nasa. gov: /home/user/cvs co PN 

2.4 Building and Setting Up AutoBayes 

After unpacking the AutoBayes tar-file or checking out the tool from the repository, 
edit the hie autobayeshomeMatkei ile so that it contains the correct path for Prolog. 
The variable PL should be set either to pi or the exact path where SWI-Prolog has 
been installed. 

In addition, variables in the Makefile in the following directory 
autobayeshome/system/SWI/ have to be set to the appropriate paths as follows. 

The variable INCLUDEDIR must be set to the location of the hie SWI-Prolog. h. For 
example 

INCLUDEDIR=/usr/local/pl-5 . 6 . 29/lib/pl-5 . 6 . 29/include 

The variable PN_LIB must be set to the location of the hie libpl.a. For example 

PN_LIB=/usr/local/pl-5 . 6 . 29/lib/pl-5 . 6 . 29/lib/i686-linux/ 

We can now use the make command into the bash shell to compile and build Auto- 
BAYES: 

% cd autobayeshome 
% make autobayes 

This command starts the Prolog system, loads the source hies, and produces an exe- 
cutable hie named autobayes in the autobayeshome directory. 

You have to set the shell variable AUTOBAYESHOME to the autobayeshome directory; 
e.g., in your .bashrc hie: 

export AUT0BAYESH0ME = autobayeshome 

Then move autobayes into a binary directory or adjust the PATH accordingly; e.g., 
export PATH=./ : autobayeshome : ${PATH} 
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3. Iris Classical Example 


The Fisher Iris flower data set is a multivariate data set introduced by Aylmer Fisher 
in 1936 as an example of discriminant analysis. The data set contains 50 samples from 
each of three types of Iris flowers ( Iris setosa , Iris Virginia , and Iris versicolor) . Each 
sample has four dimensions, petal length, petal width, sepal length, and sepal width. 
Although the data set is labeled, for the purpose of this example, we just consider the 
4-dimensional unlabeled data set and ask for a classification. 

This dataset was used by Fisher in his initiation of the linear discriminant model to 
classify the flower types based on the combination of the four dimensions. 

In this chapter we will: 

1. construct a model for the Iris example, 

2. invoke AutoBayes on the model and generate program code, 

3. compile and run the generated program, 

4. provide the Iris data set as input and analyze the results. 

3.1 Constructing an AutoBayes Model for Iris Flower Set 

As we already know, the first step is to construct a statistical model using the Au- 
toBayes specification language. This section will guide the user through the Auto- 
Bayes model for this example. Chapter 8 provides a more complete and thorough 
description of various AutoBayes input model examples. Chapter 7 discusses details 
of the AutoBayes specification language. 

In Fisher’s classical data set, there are 3 classes of 50 instances each. The 3 classes 
represent the types of the Iris flower: Setosa, Versicolor and Virginica. In our example, 
we want unsupervised clustering, that is, we will ignore the labels on the data set. We 
furthermore assume that all measurements are Gaussian distributed, which means 
that we have a mixture of Gaussians problem. In order to keep the specification 
compact, we represent the 150 data sets with 4 features each as a matrix of size 
4 x 150. 

Please note that AutoBayes uses a 0-based indexing of all vectors and arrays. In 
our example, the data matrix is indexed data ( 0 .. 3 , 0 .. 149 ). 
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Listing 3.1 shows the specification. First, we declare the name of the model (Line 1) 
and the various constants and parameters. Each of the statements ends with a period 
and comments are attached to a statement with as, with strings delimited by a 
single quote Line comments are preceded by a %. In order to be as flexible as 
possible, all the dimensions are declared as symbolic constants. In the declaration, 
we write const nat to obtain a natural number constant that can be zero or a positive 
integer. We know that the number of classes (3) is positive and much smaller than the 
number of data points (150). We therefore can provide this knowledge as additional 
constraints. These guide the AutoBayes system to select the right algorithms and 
they are checked when the generated code is called. If such a constraint is violated, 
e.g., if we want to cluster 10 data points into 3 classes (which does not make any sense 
from a statistical point of view), a run-time error is produced. 

The unknown parameters of our model are 

• the type frequency phi (0) is a probability vector returning how often each class 
occurs in the data set. As a probability vector, it adds up to one, i.e. , 

2 

= 1 

i = 0 

Line 11 reflects this constraint. Please note that in the AutoBayes specifica- 
tion language the (scalar) assignment is denoted by a :=, whereas ‘ ” is used 
for equality comparison. 

• mean values /i ?J - and standard deviations a tj , declared as matrices, denoting the 
parameters for the Gaussian for each feature i and each class j. Their data type 
is double. Since we know cr.^- > 0, we specify this as an additional constraint in 
Line 15. The underscores mean “all values” in the sense of the operator in 
Matlab™. 

1 model iris as 

2 ’Simple multivariate clustering model for classical Iris flower 

EXAMPLE ’ . 

3 

4 const nat n.variables as ’Number of features’. 

5 const nat n.points as ’Number of data points ’ . 

6 const nat n_classes as ’Number of classes ’ . 

7 where 0 < n .classes. 

8 where n .classes <C n.points. 

9 

10 double phi ( 0 . . n.classes — 1) as ’Class probability vector. ’ . 
n where sum (I := 0 .. n .classes— 1, phi (I)) = 1. 


12 
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13 double mu( 0 . . n_variables — 1, 0 . . n_classes — 1) as ’Matrix of means’ . 

14 double sigma ( 0 .. n_variables — 1, 0 . . n_classes — 1) as ’Matrix of std devs ’ 

15 where 0 < sigma 

16 

17 output nat class .assignment (0 . . n_points — 1) as ’Class of each point’ . 
is class .assignment (_ ) ~ discrete ( vector ( I := 0 .. n.classes— 1, phi(I))). 

19 

20 data double i r is _d at a ( 0 . . n.variables — 1 , 0 . . n.points — 1) . 

21 iris .data (C, I ) ~ gauss (mu(C, class.assignment ( I ) ) , sigma(C, 

class.assignment(I))) . 

22 

23 max pr( { iris.clata } | { phi , mu, sigma }) for { phi , mu, sigma }. 

Listing 3.1: AutoBayes model for Fisher’s Iris data set. 

We specify a vector class_assignment, which stores the most probable class for each 
data point. This means that it is of length 150, and its values can be 0 (class I), 1 
(class II), and 2 (class III). This class assignment is unknown and is to be estimated. 
Since we want to have the estimated vector returned, this statement is marked with 
the keyword output. Lines 20-21 declare the data iris.data as a matrix, which is 
Gaussian distributed. The distribution is along the features with separate standard 
deviations for each class and feature. 

Line 23 contains the goal statement, telling AutoBayes the kind of statistical prob- 
lem that needs to be solved. In our case, we want to estimate the unknown parameters 
of the mixture; i.e. , we need to maximize the probability of the data given the param- 
eters 

max Pr (iris_data|{0, //, a}) 

for the unknown parameters. The specification is a direct transcript. 

At the end, we save our model as iris . ab and use it in the following steps. 

Invoking AutoBayes on the Iris Input Model 

As we mentioned earlier, AutoBayes is a code generator for scientific data analysis 
programs. We now have a statistical model for our scientific data set (Fisher’s data 
set). The next step is to apply AutoBayes to the model we just constructed and to 
generate program code to use for our final goal of data analysis. For our example, we 
want to generate a data analysis function that can be invoked from either the Octave 
environment 1 or Matlab™, depending on your working environment preference. 

1 Octave is an open source program system for scientific computation, which is very similar to 
Matlab™. Octave can be downloaded from http://www.octave.org 
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3.2.1 Generating Code for OCTAVE 

To generate C++ code which can be directly called from Octave as a function, invoke 
Auto Bayes as follows: 

% autobayes -instrument iris.ab 

In our example, a file with the name iris . cc will be generated. The file name (in our 
case iris) is the same as the model name in the AutoBayes specification (Line 1 
in Listing 3.1). In the sequel, we will use this name as the default. 

3.2.2 Generating Code for Matlab™ 

Other target languages and platforms can be selected using the -target <option> 
command line option. To generate code that can be called from MATLAB™, invoke 
AutoBayes as follows: 

% autobayes -instrument -target matlab iris.ab 

This will generate the program in C. In our example, a hie with the name iris.c 
will be generated. The hie name (in our case iris) is the same as the model name in 
the AutoBayes specification (Line 1 in Listing 3.1). In the sequel, we will use this 
name as the default. 

3.2.3 Flags 

The -instrument hag after the autobayes command is used to display and store the 
convergence error at each iteration cycle of the clustering algorithm that it employs 
in the generated code. 

Additional command line options can cause the AutoBayes system to produce dif- 
ferent variants of the data analysis algorithms, varying the kind of algorithm (e.g., 
EM, fc- means) , initialization, or termination conditions. Details about command line 
options are described in Appendix A. 

3.3 Compiling and Running the Iris Generated Program 

Now that we have a generated C++ or C program, we want to dynamically link it into 
the Octave or Matlab™ environment. This section will go through the steps to 
compile the generated program and create an oct-hle for the OCTAVE environment or 
mex-hle (using the hie created with -target matlab) for the MATLAB™ environment. 
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3.3.1 Compiling and Running the Generated Program for the OCTAVE 
Environment 

For us to be able to run the program, we will need to first compile the hie 

iris . cc and generate a binary hie with the extension . oct. We will use the mkoctf ile 

command to generate this hie: 

> mkoctf ile -I$AUTOBAYESHOME/system/octave/include iris.cc 

This command will generate the hie iris. oct. We then enter the OCTAVE environ- 
ment by typing octave into the command line. Once we enter OCTAVE we can call 
the program we just generated using previous command by typing its name and the 
appropriate arguments as described in Section 3.4. 

3.3.2 Compiling and Running the Generated Program for the MATLAB™ 
Environment 

For working in the MATLAB™ environment, we use the iris . c hie that we generated 
using the -target matlab hag along with the mex command. We type into the 
MATLAB™ or shell command line: 

» mex -I$AUTOBAYESHOME/system/matlab/include iris.c 

This command will generate a dynamically linked binary hie iris.x where x de- 
pends on the operating system (e.g., mexmaci: Mac OSX, rnexglx: Linux, mexw32: 
Windows 32-bit). We now simply call the model by typing its name iris into the 
Matlab™ command line and it will show the proper usages for the model. 

3.4 Providing Input to the Iris Program and Analyzing Results 

In order to run the generated program, we enter the OCTAVE or Matlab™ environ- 
ment. In the following, we will use Octave, but Matlab™ works the same. We 
enter the OCTAVE environment by typing octave to the command line prompt. 

The next step is to call the model from the OCTAVE environment; in our case, we 
labeled our model iris. If we simply type the model name to the OCTAVE command- 
line, the model usage is returned 2 : 

octave :1> iris 

usage: [vector c 1 as s_ assignment , matrix mu, vector phi, 

matrix sigma, vector errors] = 

2 We reformatted the output to fit within the margins of this page. 
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iris (matrix iris_data, int n_classes, 

double tolerance , int maxiteration) 

In order to cluster the data, we now load the input hie that contains the entire data 
set of 150 samples for the Iris flower. In our case, we have named the hie iris_data.m. 
This script sets the variables Class, PetaldLength, Petal_width, SepaldLength, and 
Sepal_width. The script can be loaded simply by typing its name to the OCTAVE 
command prompt: 

octave :4> iris.data 

One way to make sure your model is linked to your dataset is to use the who command 
after calling the input script. The who command will show you the dynamically-linked 
functions and also the local user variables for the Iris model, for example: 

*** dynamically linked functions: 
iris 

*** local user variables: 

Class Petal_length Petal_width Sepal_length Sepal_width 

octave : 6> 

The variable Class contains the labels, as strings. This variable is not used for our 
example as we will do unsupervised clustering. The other variables (PetaldLength, 
Petal_width, SepaldLength, and Sepal_width) are vectors of size 1 x 150 which 
contain the measurements. 

You can use the whos command which returns the dynamically linked function and 
also the local user variables along with row and column information for each variable: 


octave :6> whos 

*** dynamically linked functions: 

prot type rows cols name 

r — dynamically-linked function - - iris_classical_example 


*** local user variables: 
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prot 

type 

rows 

cols 

name 

rwd 

string 

1 

2000 

Class 

rwd 

matrix 

1 

150 

Petal_length 

rwd 

matrix 

1 

150 

Petal_width 

rwd 

matrix 

1 

150 

Sepal_length 

rwd 

matrix 

1 

150 

Sepal_width 


octave : 7> 

Now we need to define a matrix containing all the measured data. As defined in the 
AutoBayes model, this matrix is of size 4 x 150 (n_features x mpoints) . We will 
call this matrix data and set its values as below: 


octave :7> data = [Petal_length; Petal_width; Sepal_length; Sepal_width ]; 
The next step is to invoke the iris program with input and output parameters: 

[c, mu, phi, sigma, errors] =iris (data, 3, 0.00001, 30) ; 


where: 

c 

class assignment vector 

mu 

mean 

phi 

class frequency (class probability) 

sigma 

standard deviation 

errors 

vector to store errors 

3 

number of classes 

data 

matrix containing Iris data 

0.00001 

convergence error for termination 

30 

maximum number of iterations 


When this command is called, you should receive an output similar to the following: 

octave: 10> [c, mu, phi, sigma, errors] = iris(data, 3, 0.00001, 30); 
pvar(89) = 3.25039 
pvar(89) = 1.36758 
pvar(89) = 0.445656 
pvar(89) = 0.733875 
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pvar (89) 
pvar (89) 
pvar (89) 
pvar (89) 


0.00627092 

0.00575855 

0.00528546 

0.00484893 


Please note that since the initialization process is random, each time yon get different 
results. 

For each iteration of the expectation-maximization (EM) algorithm (Section 5.1.2), 
the current error is displayed (Section 8.2.2). This error should become smaller as 
the algorithm proceeds, but “jumps” are possible. If the error becomes smaller than 
the given “tolerance” (10~ 5 in our example) or if the number of iterations exceeds 
“max iterations” (30 in our example), the algorithm terminates and sets the result 
variables. Figure 3.1 shows the development of the error; this is just a semilog plot of 
the variable “errors”. This plot can easily be generated by typing semilogy (errors) . 


3.4.1 Executing Commands and Reading Outputs 

Now that you have executed the model and generated results, it is time to view them. 
As specified in the AutoBayes model, the algorithm estimates the parameters of 
the Gaussians //, a 2 for each class as well as the class frequency of </>. Also the most 
likely class assignment c is returned as it has been declared as an output variable, c 
is a vector of size 150 with entries 0, 1, 2 which correspond to the class assignment 
(classes I, II, and III, respectively) for each measurement. We will use the variable c 
to visualize the clustering. 

Type mu into the command line to view the means for all three classes and their 
features. For example you should get similar outputs to below where columns are 
classes I through III, and the rows are features 1 through 4 (petal length, petal width, 
sepal length, sepal width): 

octave :15> mu 


mu = 


5.49353 

1.99709 

6.62647 

3.01786 


1.46400 

0.24400 

5.00600 

3.41800 


4.23318 

1.30830 

5.84463 

2.70497 


octave : 16> 
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You can also type phi or sigma to view the probability of classes I— III and standard 
deviations respectively. 

For example: 

octave : 16> phi 

phi = 

0.35589 

0.33333 

0.31078 

octave : 17> sigma 
sigma = 

0.56852 0.17177 0.47892 

0.28738 0.10613 0.18796 

0.57168 0.34895 0.48216 

0.28793 0.37719 0.29656 

octave : 18> 

3.4.2 Interpretation of Results 

With the results provided by AutoBayes, the means /i, the standard deviations a 2 , 
the class frequencies </>, and the most-likely class assignment c, we now can start to 
interpret and visualize the results. 

It is hardly surprising that the class frequency </> is, for each of the classes, around 0.3, 
which means that roughly 30% of the measurements fall into each of the three classes. 
The original set is actually constructed such that exactly 1/3 of the measurements 
(50 out of 150) belong to each of the three types of Iris flowers (classes). Interestingly, 
however, the numbers found by the AutoBayes model are not exactly those, which 
means that some of the data points have been misclassihed. A closer look at the data 
points reveals more information. Please note that we do not use the class label, which 
is provided with the original data set. Rather, our aim was to separate the data as 
best as possible into three classes without knowing the answer. 

Figure 3.2 shows a simple scatter plot of the 150 data points along the various pairs 
of features. Because there are 4 variables, a 4 x 4 matrix is needed. This figure shows 
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Figure 3.1: This graph shows the logarithm of the error term convergence on the 
y-axis and time (iteration) on the x-axis. 


only the upper triangular part of the scatter matrix. In all plots, two groups can 
easily distinguished. For the human eye, a classification of the data into three classes 
seems to be impossible. 

After running the AuToBAYES-generated code, we present the same scatter plots, 
however this time, each data point is colored according to its most probable class 
membership c,; (Figure 3.3). 

Finally, the governing parameters of the three Gaussians (fa, of) can be plotted into 
such a scatter-plot. Figure 3.4 shows such a plot for a selected pair of features. The 
thick magenta dots represent the center of the Gaussian, the cyan ellipsis corresponds 
to a lex 2 ellipsis. Also, we have indicated Fisher’s given classification by plotting three 
different plot symbols ((g), ©, ©). In a perfect classification, each kind of plot symbol 
would be the same color. With such a visualization, it can easily be seen how the 
data have been separated and where potential mis-classihcations occur. These figures 
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are only a few possibilities for visualization of the results. 
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Figure 3.2: Scatter-plot for each of the four variables of the iris data set. 
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Figure 3.3: Scatter-plot for each of the four variables of the Iris data set. Most likely 
classes for each data point are shown in different colors. 
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Figure 3.4: Scatter-plot for the features Petal length versus Sepal length with esti- 
mated class parameters /i, a 2 shown. 
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3.4.3 MATLAB™ Scripts for Iris Plots 

Listings 3.2 and 3.3 show the Matlab™ scripts used to generate Figures 3.2 and 
3.3, and Figure 3.4, respectively. The Auto Bayes- generated clustering algorithm 
has been executed beforehand, so the estimated parameters mu, sigma, and class 
assignment c are available. 


1 figure ; 

2 S = 5; 

3 

4 subplot (3,3,1); 

5 scatter (Petal-length , PetaLwidth ,S,c); 

6 x lab el ( ’ Petal u Length ’ ) ; y label ( ’Petal ^ Width ’ ) ; 

7 

8 subplot (3,3,2); 

9 scatter (Petal.length ,Sepal_length , S , c) ; 

10 x lab el ( ’ Petal u Length ’ ) ; y label ( ’Sepal ^ Length ’ ) ; 

11 

12 subplot (3 ,3 , 3 ) ; 

13 scatter (Petal.length , Sepal .width, S,c); 

14 x lab el ( ’ Petal u Length ’ ) ; y label ( ’Sepal ^ Width ’ ) ; 

15 

16 subplot (3,3,5) ; 

17 scatter (Petal .width, Sepal-length ,S,c); 

is xlabel( ’Petal u Width ’) ;ylabel( ’Sepal u Length ’ ) ; 

19 

20 subplot (3,3,6); 

21 scatter (PetaLwidth , Sepal .width ,S,c) ; 

22 xlabel( ’Petal u Width ’) ;ylabel( ’Sepal u Width ’ ) ; 

23 

24 subplot (3,3,9) ; 

25 scatter (Sepal .width, Sepal.length ,S,c); 

26 xlabel( ’Sepal u Width ’) ;ylabel( ’Sepal u Length ’ ) ; 

Listing 3.2: Matlab™ script used for generating Iris scatter plots 
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1 figure ; 

2 

3 S = 5; 

4 

5 C=[’r ’,’gVb’]; 

6 for i = 1 : 3 

7 cm=f i n d ( c=i — 1 ) ; 

8 plot ( Petal-length (cm) , Sepal-length (cm) , [ C( i ) ’ o ’ ] , ’ markersize 

’,io’); 

9 

10 hold on ; 

11 end ; 

12 

13 plot(Petal_lengtli(l:50) ,Sepal_length(l:50) ,’kx’); 

14 plot (Petal-length (51:100) , Sep a l_length (51:100) ,’k. ’); 

15 plot (Petal-length (101:150) ,Sepal_length(101:150) ,’k+’); 

16 xlabel (’ Petal -Length ’) ; 

17 y label ( ’ Sepal -Length ’ ) ; 
is i 1 = 1 ; 

19 i 2 = 3 ; 

20 

21 hold on ; 

22 

23 for cl =1:3, 

24 plot (mu( il , cl ) ,mu( i2 , cl ) , ’m. ’ , ’ Markersize ’ ,25) ; 

25 

26 for p = 0 : 0 . 1 : 2 * pi , 

27 x = mu( il , cl ) + sigma ( il , cl ) *cos (p) ; 

28 y = mu( i 2 , c 1 ) + sigma (i2 , cl)*sin(p) ; 

29 plot (x , y , ’ex’) ; 

30 end ; 

31 end ; 

Listing 3.3: Matlab™ script for scatter-plot with estimated parameters 
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4. System Functionality 


In this chapter, we will describe the internal workings of the AutoBayes system. 
First, an overview of the architecture of the AutoBayes system will be given. Then, 
we will discuss how the code and the documentation is generated. Finally, we will 
describe how AutoBayes can be used to generate artifical (random) data for the 
given statistical model. 

4.1 Overview 
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Figure 4.1: Principle Architecture of AutoBayes 

In a first processing step, the given specification is parsed, converted into internal 
form, and the Bayesian network is constructed. This step can also generate an external 
representation for visualization purposes, using the graph drawing tool graphviz 1 . The 
synthesis kernel then analyzes the network, tries to solve the given optimization task, 
and instantiates appropriate algorithm schemas, which are given in a schema library. 

http : //www . graphviz . org 


l 
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Figure 4.1 shows the principle architecture of AutoBayes. The output of the 
synthesis kernel is a program in a procedural intermediate language. AutoBayes’s 
backend takes the intermediate code, optimizes it and generates code for the chosen 
target system. Currently, C or C++ code is generated, which can be used in a stand- 
alone way, or can be linked to the Matlab™ or Octave environment. The synthesis 
kernel also produces detailed documentation in the form of an HTML design document 
along with the code. Furthermore, AutoBayes can generate code which generates 
artificial sampling data for the model, e.g., for visualization and testing purposes. 

All parts of the AutoBayes system rely heavily on a symbolic subsystem and some 
auxiliary system modules (e.g., pretty-printer, set representations, I/O functions). For 
symbolic mathematical calculations, we implemented a small but reasonably efficient 
rewriting engine in Prolog. Graph handling, simplification of mathematical expres- 
sions, and an equation solver are implemented on top of it. The system architecture is 
designed in such a way that most of its parts can be re-used for different domains. In 
particular, the backend and symbolic subsystems are entirely independent of the data 
analysis domain. For details on the principles of schema based program synthesis see 
[FS03b] . 

4.2 Generating Code 

The synthesis kernel of AutoBayes generates code in an intermediate language before 
the code for the actual target system is produced. This intermediate language is a sim- 
ple procedural language with several domain-specific extensions such as convergence 
loops, vector normalization, simultaneous vector assignment, as well as assertions and 
annotations. The domain-specific constructs allow target-specific optimizations and 
transformations. For example, the sum construct of the intermediate language, for 
calculating the sum of vector elements, can be converted into a “for” loop, an iterator 
construct for sparse matrices, or a function call to a library. The language supports 
most basic matrix operations. 

The actual target-specific portion of the code generator is rather straightforward and 
can be adapted to different target languages and enviroments. With the help of rewrite 
rules all constructs of the intermediate language are transformed into constructs of the 
target language and printed using a generic pretty-printer. The backend also generates 
boiler-plate code to interface the algorithm with the target system, and optimizes the 
code. However, standard optimizations (e.g., evaluation of constant expressions) are 
left for the subsequent compilation phase. There is no need to perform the same 
optimization steps that are already in many modern compilers. 
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4.3 Generating Documentation 

Certification procedures for safety-critical applications (e.g., in aircraft or spacecraft) 
usually mandate manual code inspection. This inspection requires that the code is 
readable and well documented. Even for programs not subject to certification, under- 
standability is a strong requirement because manual modifications are often necessary, 
e.g., for performance tuning or system integration. However, existing code generators 
often produce code that is hard to read and understand. In order to overcome this 
problem, AutoBayes generates explanations along with the programs that show 
“synthesis decisions” : which algorithm schemas have been used, how schema parame- 
ters have been instantiated, etc. Model assumptions that were used and proof obliga- 
tions that could not be discharged during the synthesis arc laid out clearly. This makes 
the synthesis process more transparent and provides traceability from the generated 
program back to the model specification. In addition to the design document, Auto- 
Bayes can also generate the mathematical derivation of the generated algorithm as 
a HTgX- document , when the command-line flag -latex is used. For illustration pur- 
poses, Chapter 8 shows the autogenerated derivations of selected examples, enclosed 
between begin autogenerated document and end autogenerated document. 

4.4 Generating Artificial Test Data 

Visualization and simulation play important roles in the development of data analysis 
programs. An AutoBayes model specification contains enough information to syn- 
thesize code which generates artificial sampling data according to the specification. 
Generating artifical test data is very helpful in understanding the model and the gen- 
erated code. If the artificial data does not match the real data set or the scientist’s 
expectations, the specified model might not reflect the reality properly. Artificial 
data sets can also be used to assess and evaluate the performance of the synthesized 
code before real data become available. This feature is of particular interest in cases 
where AutoBayes allows the instantiation of different algorithms for the same spec- 
ification. For example, if AutoBayes synthesizes different variants for initialization 
of the hidden variable, their coarse relative performance can be assessed with the 
generated test data. 

Code for the generation of artificial test data can be produced easily by using the 
commandline flag -sample on the AutoBayes specification. The system then gen- 
erates a hie samplevnodei . cc that can be compiled in the usual way. Values for all 
constants and the parameters which are to be estimated must be provided when this 
function is called. It then returns random values for the data and hidden variables. 
For example, the Iris example of Chapter 3 would generate the following sampling 
function 
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[vector class_assignment , matrix iris_data] = 

sample_iris_classical_example (matrix mu,int n_points .vector phi, matrix sigma) 

Figure 4.2 shows a scatterplot of an artificially generated data set with 150 data 
points. A comparison with Figure 3.3 on page 3.3 shows that the artificial sampling 
data, generated by our statistical model, are very similar to the original data set. 
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Figure 4.2: Scatter-plot shown artificially generated sampling data from the Auto- 
Bayes iris model (Listing 3.1) 


5. Generated Algorithms 


AutoBayes generates appropriate algorithms from algorithm skeletons with the help 
of symbolic calculations. Typical algorithms include: 

• clustering 

• numeric optimization 

5.1 Clustering Algorithms 

AutoBayes uses clustering algorithms to generate code from certain statistical mod- 
els. Clustering deals with finding a structure in a collection of unlabeled data. A brief 
definition of clustering would be “the process of organizing objects into groups whose 
members are similar in some way” . A cluster is therefore a collection of objects which 
are “similar” and are “dissimilar” to the objects belonging to other clusters. Au- 
toBayes currently implements two such algorithms, namely k-means and the EM 
algorithm. 

5.1.1 k-Means Algorithm 

K-means (MacQueen, 1967) is one of the simplest unsupervised learning algorithms 
that solve the well-known clustering problem. The procedure classifies a given data 
set into a number (7) of clusters, where 7 is specified a priori. The main idea is to 
define 7-centroids, one for each cluster. These centroids should be placed initially in a 
cunning way because different initial locations cause different results. A good choice 
is to place them as far away from each other as possible. The next step is to take each 
point from the given data set and associate it to the nearest centroid. This induces an 
early clustering. At this point we need to recalculate 7-new centroids as barycenters 
of the clusters resulting from the previous step. For each group, this minimizes the 
objective function that is the sum of the Euclidean distances of the centroid from 
each point in the group. After we have these new centroids, the step is repeated 
by associating each data point with the new nearest centroid. These steps are done 
iteratively until the centroids do not move anymore. The output of the algorithm is 
the location of the 7-centroids. 
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5.1.2 The EM Algorithm 

An expectation-maximization (EM) algorithm is used in statistics for finding max- 
imum probability estimates of parameters in probabilistic models, where the model 
depends on unobserved latent (hidden) variables. The EM algorithm [MK97] alter- 
nates between performing an expectation (E) step, which computes an expectation 
of the likelihood by including the latent variables as if they were observed, and a 
maximization (M) step, which computes the maximum likelihood estimates of the 
parameters by maximizing the expected likelihood found during the E step. The pa- 
rameters found in the M step are then used to begin another E step, and the process is 
repeated. A detailed description of the algorithm as applied to a mixture-of-Gaussians 
problem is given in Section 8.2. 

5.2 Numerical Optimization Algorithms 

AutoBayes uses numerical optimization algorithms in order to generate code for the 
statistical model. The numerical optimization task is to find, given a single function 
that depends on one or more parameters, values for those parameters where the given 
function achieves its maximum or minimum value. For a comprehensive description 
of well-known optimization algorithms see [GMW81]. Typical numerical optimization 
algorithms that AutoBayes uses include: 

• Nelder-Mead Downhill Simplex Method 

• Golden Section Search Method 

5.2.1 Nelder-Mead Downhill Simplex Method 

The downhill simplex method [PFTV92] is due to Nelder and Mead. This method 
only requires function evaluations, but not derivatives. Therefore, it is a natural choice 
in many situations where no derivatives exist, or where they are very expensive to 
calculate. 

A simplex is a geometrical figure: in TV dimensions, it is the convex hull of TV + 1 
independent points (or vertices). In two dimensions, a simplex is a triangle. In three 
dimensions it is a tetrahedron (not necessarily a regular tetrahedron). In general we 
are only interested in simplexes that are non-degenerate, i.e., that enclose a finite 
inner TV-dimensional volume. If any point of a non-degenerate simplex is taken as the 
origin, then the TV other points define vector directions that span the TV-dimensional 
vector space. For multidimensional minimization, the best we can do is give our 
algorithm a starting guess, that is, an TV-vector of independent variables as the first 
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point to try. The algorithm is then supposed to make its own way downhill through 
the unimaginable complexity of an TV-dimensional topography, until it encounters a 
(local) minimum. The downhill simplex method must be started not just with a single 
point, but with N + 1 points, defining an initial simplex. 

5.2.2 Golden-Section Search Method 

The Golden-Section search method [PFTV92] is a numerical optimization method 
for continuous functions of a single variable which does not use derivatives at all 
(i.e., it does not require either symbolic differentiation or numerical computation or 
approximation of derivatives). The algorithm optimizes by iteratively bracketing the 
minimum by a triplet of points, a < b < c, such that f{b) is less than both /(a) and 
/(c) and b — a and c — b are related by the golden ratio. In this case we know that 
there exists a local minimum in the interval (a, c). The algorithm chooses a new point 
x, either between a and b or between b and c. Suppose, to be specific, that we make 
the latter choice. Then we evaluate f(x). If f{b) < f(x), then the new bracketing 
triplet of points is ( a,b,x ); otherwise, if f{b) > f(x), then the new bracketing triplet 
is ( b,x,c ). This process continues until the distance between the two outer points of 
the triplet is within a designated tolerance level. Note that the convergence is linear , 
meaning that each iteration gains an additional binary digit of accuracy. 

5.2.3 Initialization 

General optimization algorithms (in particular the Nelder-Mead Simplex Algorithm) 
require values for the starting point and values for the initial step size. Poor choices 
can result in poor performance of the algorithm and even non-termination. Auto- 
Bayes provides several ways for finding initial start and step values. Its behavior 
can be controlled via two pragmas, scheraa_control_arbitrary_init_values and 
schema_control_init_values. The following methods for calculation of the start 
value so and step Jo are provided; Table 5.1 shows how the pragmas need to be set. 

Range If the variable under consideration is restricted in its value range by con- 
straints to [/,..., h], then s 0 = l + (h — /)/ 2 and J 0 = {h — /)/10. 

Prior If the variable has a prior, the initial values are determined by the moments 
of the prior: So becomes the first moment, <5o the second moment. 

User Additional input arguments for the generated code are generated to provide so 
and (Jo- 


Arbitrary If no additional information is known, AutoBayes uses s 0 = 1 and 
Jo = 1- 
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Pragma 

Range 

Prior 

User 

Arbitrary 

schema_control_arbitr ary _init .values 
schema_control_init .values 

false 

automatic 

false 

automatic 

true 

user 

true 

arbitrary 


Table 5.1: Control of initialization 

5.3 Generic Optimization 

Whenever AutoBayes encounters an optimization problem (maximization or min- 
imization), it first attempts to find a closed-form solution. This can be attempted 
by symbolically calculating the derivatives, setting them to zero, and solving the re- 
sulting set of equations. In addition, the second derivatives are calculated (if they 
exist) and checked for the correct sign. If such a solution cannot be found (e.g., 
there exists no closed form solutions, or if the symbolic solver in AutoBayes cannot 
find a solution), the problem is divided into pieces which can be handled individu- 
ally. If this does not yield results, numerical optimization algorithms are instanti- 
ated. If this is not successful, AutoBayes fails to generate a program, unless the 
pragma schema_control_use_generic_optimize=true is set. In that case, a (non- 
executable) statement optimize (. . .) is generated in the final code for debugging 
purposes. 




6. Using AutoBayes 


AutoBayes is a program generator for scientific data analysis programs. The input 
problem is a file written as a specification of a data analysis problem in the form of 
a statistical model with declarations, constraints, and a goal. Then AutoBayes is 
invoked on that input hie. From the outside, AutoBayes works very similar to a 
compiler: AutoBayes reads the input hie and command-line options and generates 
an internal representation (a Bayesian network). Once the algorithms have been gen- 
erated and optimized, AutoBayes generates the code hies as well as various listing 
and documentation hies. In this reference chapter, we will discuss calling conventions, 
common command-line hags, compilation of generated code, and AutoBayes error 
messages. 

6.1 Invoking AutoBayes on an Input File 

To invoke AutoBayes on a specihc input hie, the autobayes command is used: 

% autobayes [options] [pragmas] m.odelfilename . ab 

Unless a specihc target language is requested, this call will generate C++ code from 
the input hie by default. The extension for model hies is “.ab". The names of hies 
generated by AutoBayes are derived from the name of the statistical model (given 
by the model declaration in the input specification, i.e, 

model modelname as ’ . . . ; . 

or, in one case, from the model’s filename. For the following, we assume the model 
name is modelname and the model’s filename is modelfilename. 

Each option is specihed by a hag on the command line optionally followed by an 
argument. Each pragma is introduced by the hag ‘-pragma’ followed by a pragma 
variable and its setting. The following sections describe options and pragmas in more 
detail. 

6.1.1 Generated File Names 

The hies below can be generated by AutoBayes. 
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• cpp^modelfilename.ab The model file preprocessed by cpp. 

• modelname. cc The generated C++ code. 

• m.odelname.c The generated C code. 

• modelname. cc.html The generated C++ code in htrnl format. 

• modelname. c. htrnl The generated C code. in htrnl format 

• modelname .log The default log file generated when the ‘-log’ option is speci- 
fied. A particular file can be specified using the ‘-logfile’ option. 

• modelname- design . htrnl The htrnl design document generated when the ‘-designdoc’ : 
flag is specified. 

• modelname . dot The graphical representation of the Bayes net generated when 
either the ‘-dot’ or ‘-designdoc’ flag is specified. 

• modelname. stage, list where stage is synt, iopt, inst, lang, prop, ann, or 
lopt. These are results of intermediate stages of the program synthesis pro- 
cess, generated when the ‘-list’ option is specified. The interesting ones are 
synt which is the synthesized program written in the high-level intermediate 
language, and lang, which is the transformed program just before printing out 
the C/C++. 

• modelname . stage . dump Same as the above files, but presented in the internal 
Prolog format. 

• modelname . stage .tex The BTgXfile of results of the intermediate stages of the 
program synthesis process. This file is generated when the ‘-tex’ flag is specified. 
Providing ‘synt’ as the stage generates a BTf^Xfile giving the derivation of the 
generated program. 

• sampl e _modelname. cc The C++ program synthesized to generate artificial data 
(see Section 4.4). 

• sample_m odelname. c The C program synthesized to generate artificial data (see 
Section 4.4). 

6.2 Command-Line Flags 

AutoBayes command line options are set by specifying flags with the autobayes 
command. For instance ‘-help" is a flag that can be used along with the autobayes 
command to view a list of all available flags. Appendix A.l lists all flags. 
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For example: 

% autobayes -help 
The general format is: 

% autobayes -flag flag- options 

6.2.1 Help Flag 

Using the ’-help’ flag alone will display a list of flags and their options and usages. 
For example: 

% autobayes -help 

Providing a flag as argument to the help flag, as in ‘-help flag', displays the usage 
of the option associated with the particular flag flag. 

6.2.2 Design Document Flag 

The ‘-designdoc’ flag will generate a software design document for the specified 
model in htrnl format. It will also generate in a ‘ .dot’ hie a visualization of the Bayes 
net 1 . For example: 

% autobayes -designdoc model filename . ab 

You can specify a name for the generated document by using: 

% autobayes -designdoc designdocfilename m.odelfilename . ab 
The design document contains the following information. 

• Summary 

List of File Names 

• Input Specification 

Textual Input Specification 
Graphical Representation 

• The Code Generation Process 

AutoBayes Command Line Parameters 

^ee Section 8.2.1 for an example visualization. 
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• Generated code 

Interface 

Input and Output Parameters for Generated Code 
Assertions and Error Handling 
Intermediate Code 
Final Code 

• Warnings/Errors 

Synthesis Constraints 
Compiler Warnings 

The ‘.dot’ hie can be converted to a ‘.jpg' format using the ‘dot’ command; for 
example: 

% dot -Tjpg filename, dot > filename, jpg 

6.2.3 Target Flag 

This hag gives the user the option to specify the language of the generated code. By 
default it is set to generate C++ code, but the ‘-target’ hag controls the generated 
code language. Available language and runtime environments are: 

c.standalone matlab modula2 octave spark 

For example: 

% autobayes -target c .standalone modelfilename . ab 
will generate output in C code and 
% autobayes -target matlab modelfilename. ab 

will generate C code which can be dynamically linked into the Matlab™ environ- 
ment. 

6.2.4 Artificial Data Flag 

The hag ‘-sample’ specifies generation of a program to generate artificial sample data. 
See Section 4.4. 
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6.3 Pragmas 

AutoBayes pragmas are low-lcvcl flags and commands to control specific actions in 
the AutoBayes system. Their main purpose is to help the developer and advanced 
user guide the AutoBayes system in a specific way. Pragmas are specified in the 
command line as follows: 

% autobayes . . . -pragma pragma=value . . . 

There should not be any spaces around the equal sign. The complete list of pragmas 
and allowable values for each is given in Appendix A. 2. Also, a complete list of 
AUTOBAYES pragmas can be obtained using the ‘help" option: 

% autobayes -help pragmas 

Several pragmas control aspects of the EM algorithm: em, em_log_likelihood_convergence, 
em_q_output, and em_q_update_simple; see the Appendix for details and Section 8.2.1. 

6.4 Compiling and Running the Generated Code 

AutoBayes generates standalone C code or dynamically linked Matlab™/Octave 
C/C++ code. In order to provide input and run the generated stand-alone C code, 
a driver program is needed. The generated program needs to be compiled in order to 
have it run under the MATLAB™/OCTAVE systems. 

If the code generated is going to run under MATLAB™/OCTAVE environment, then 
the following steps need to be taken after code generation: 

• Load the data 

• Call the Auto Bayes- generated statistical model code on the data 

• Generate plots and analyze the results. 

In the Matlab™/Octave environment: 

load ‘data. dat+ 

[mu, sigma, . .] = mog (...) 
plot (...) 

6.4.1 Compiling and Running the Generated Program: OCTAVE 

To run the code, we first need to compile the generated program and create an Octave 
hie. For compiling and creating an OCTAVE hie we use the mkoctfile command. 
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% mkoctfile -I$AUTOBAYESHOME/system/octave/include modelname.cc 

For the mixture of Gaussians example in Section 8.2.1: 

% mkoctfile -I$AUTOBAYESHOME/system/octave/include mog.cc 

This will produce an Octave hie. We then enter the Octave environment by typ- 
ing octave to the command line. To see the usage of synthesized code, invoke the 
generated code by entering the name of the model in the OCTAVE environment: 

octave :1> mog 

For our example, vector input is provided and parameter values are returned: 

usage: [vector c, vector mu, vector rho, vector sigma] = 
mog(vector x, 

double tolerance, 
int maxiteration) 

6.4.2 Compiling and Running the Generated Program: MATLAB™ 

To run the code, we first need to compile the generated program and create a mex-hle. 
For compiling and creating a mex-hle we use the mex command. 

% mex -I$AUTOBAYESHOME/system/matlab/include modelname.c 

Please note modelname.cc is for Octave and modelname.c is for Matlab™. 

For the mixture of Gaussians example in Section 8.2.1: 

% mex -I$AUTOBAYESHOME/system/matlab/include mog.c 

This command will produce a MATLAB™ hie; then we enter the MATLAB™ environ- 
ment by typing matlab to the command line. We call the generated code by typing 
the name of the model to the Matlab™ command-line; for example: 

» mog 

This returns the usage of the synthesized code. For the mixture of Gaussians example, 
vector input is provided and parameter values are returned: 

usage: [vector c, vector mu, vector rho, vector sigma] = 
mog(vector x, 

double tolerance, 
int maxiteration) 
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6.5 AutoBayes Error and Warning Messages 

At different stages the user may encounter different error messages. Below we have 
categorized the types of error messages the user may encounter during the process of 
using AutoBayes. 

6.5.1 Interface (command-line) Errors 

A command-line error can occur when the user sets a pragma to a wrong or uniden- 
tifiable value, or when an erroneous value is given as the argument to a flag. 

6.5.2 Syntax Errors 

A syntax-error message occurs when there is a syntax error in the specification of a 
statistical model. A syntax error could be in any segment of the model specification. 
Below is a list of syntax error messages with corresponding sections of specification 
model. 

• Commented Declarations 

Error Message: ‘error in declaration’ 

This could be a result of not following the conventions for making comments. 
The proper keyword for making comments is ‘as’. 

• Equations 

Error Message: ‘error in equation’ 

This could be a result of not using the proper operator which is ‘ : =’ for defining 
equations. 

• Distributions 

Error Message: ‘error in distribution’ 

This could be a result of not using the proper distribution operator which is ‘ ~ ’ 
for distributions. 

• Simple Declarations 

Error Message: ‘error in (const, data, datastream, output, external, 
function} declaration’ 

This could be a result of using the wrong type for declaring a variable. Make 
sure only int, nat, or double are used. 
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• Constraints 

Error Message: ‘error in constraint’ 

This could be a result of not using the proper keywords for setting constraints. 
The proper keywords are where and with. 

• Approximations 

Error Message: ‘error in approximation declaration’ 

This could be a result of not using the proper operators for setting approxima- 
tions between two terms within a given bound. The proper keywords are: 

witherror. 

• Complexity Constraints 

Error Message : ‘error in complexity declaration ’ 

This could be a result of not using the proper operators for defining complexity 
constraints. The proper keywords are: complexityof , withbound. 

6.5.3 Code-Generation Errors 

Basically these are semantic errors that can occur during the process of code genera- 
tion. A common code-generation error is ‘no code generated’. This indicates that 
no synthesized code has been generated. 

6.6 Debugging AutoBayes Specifications 

6.6.1 Running AutoBayes 

no programs generated For the given specification, no program could be generated. 
This can have a multitude of reasons. For debugging, setting the pragma 
-pragma schema_control_use_generic_optimize=true 
can help. In this case, AutoBayes generates a “call” to a generic optimizer for 
all subproblems it cannot solve. An inspection of these pieces of code can yield 
helpful insights. 

6.6.2 Running AuTOBAYES-generated Code 

• A run-time message assertion violation n_classes > 10 * n_points means 
that a constraint given in the specification is violated. For example, assertion 



6.6 Debugging AutoBayes Specifications 


63 


violation n.classes << n_points means the number of data points in the 
clustering problem is too small. Try to reduce the number of classes. 

Often, this error indicates that the data vector is in the wrong order. Try to 
call the generated code with the transposed data array. 
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7. Specification Language 


AutoBayes is a program synthesis tool for the statistical data analysis domain. Its 
input specification language is a concise description of a data analysis problem in the 
form of a statistical model; its output is optimized and fully documented C/C++ 
code which can be linked dynamically into Matlab™ and Octave environments 1 . 

In general, the format for the statistical model input (in a hie with the extension . ab) 
is as follows: 

1. Model header: model M as ’model comment’. 

2. Constant declarations 

3. Priors (hyperparameters, distributions, constraints) 

4. Data (distributions, declarations, parameters and constraints) 

5. Goal, which is estimating parameters that maximize a probability term. 

Chapter 8 gives many examples of AutoBayes models, which are almost self-explanatory. 
This chapter introduces the syntax of AutoBayes models. 

7.1 Model Declarations and Syntax 

Comments in the model follow the percent character Comments can also be 
enclosed in V*' and ‘*/’ in a C-style manner. A comment can be associated with a 
variable by writing as ’comment’ (see below). 

All declarations and statements end in a period (‘.’). 

Names of constants, parameters and statistical variables can consist of the alphabetic 
(lower and upper case) and numeric characters, as well as underscore (“_”), starting 
with a lowercase letter, e.g., n_points. Index variables, usually used to quantify over 
a vector or matrix, must start with a capital letter, e.g. x(I). This indicates that the 
declaration that it occurs in is true for all values of the variable; i.e., the declaration 
is universally quantified over I. For example, x(I) — gauss(mu(I), sigma(I)) means 

V 0 < i < length(x) : Xi 


1 Other target languages include C for stand-alone applications and Ada. 
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Sometimes a variable appears only once in a declaration, in which case, as in Prolog, 
an underscore can be used instead of a variable name, but this still means 

universal quantification. Its meaning is similar to the Matlab™ expression “1 : end'" 

In order to facilitate a style more aligned with mathematical notation, the pragma 
prolog_style can be set to false. Then, variables can start with a lower or upper- 
case letter and index variables must start with an underscore e.g., X(_i). 

A variable may be specified with one of four modes : 

• const, in which case the value of the variable will be given in the specification 
with a ‘ : =’ or is an input to the generated function (or in the case it is the 
length of input data, it is computed from the input data given to the generated 
function); 

• data, in which case it will be an input to the generated function; 

• output, in which case it will be an output of the generated function; 

• (empty). 

The input and output synopsis for the generated code is established according to the 
following rules: 

• data variables are inputs. 

• constants are inputs unless 

they have an assigned value using ‘ : =’ 

they are defined by the size of a data variable, e.g., data x(0. . n). 

• parameters for specific purposes are inputs. These are generated by Auto- 
Bayes according to internally-used algorithms. Typical examples are error 
thresholds (tolerance) and upper bounds for the number of iterations (maxiteration). 

• estimated parameters in the goal statement are outputs. 

• variables explicitly declared “output” are outputs. 

• system-generated information, e.g., a convergence vector or the log-likelihood, 
are outputs. This is controlled by the generated algorithm or specific pragmas. 

• all parameters, input and output, are sorted alphabetically. 

Scalar variables, and the entries of vector or matrix-valued variables, may be of type 
double, int, or nat, where the latter means the value is an integer greater than or equal 
to zero. 
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Scalar variables are declared by writing: 
mode type varname as ’comment’. 

where the as ’comment’ is optional but recommended. 

Vector variables are declared by writing: 
mode type varname(0 .. max_index)as ’comment’. 

Matrix variables are declared by writing: 

mode type varname(0 .. max_row .index, 0 .. max_column_index)as ’comment’. 

All lower indices must be 0 (C-stylc array indexing). 

A variable may be constrained by either specifying a distribution for it, a value, or by 
specifying range restrictions: 

• var ~ distribution 

• expr ~ distribution (the expression involves the random variable) 

• var := value 

• where range-restriction . Range restrictions may be x in min .. max or a boolean 
expression involving the variable and =, <, >, =<, >=, «, or >>. 

Table 7.1 lists the predefined distributions. 

The discrete distribution takes as argument a probability vector v, say of length 
n. For a discrete random variable X that ranges over [0 . . . n — 1], the constraint 
x~discrete(v) means 

VO < j < n : Pr( X = j ) = v(j) 

The discrete distribution is used in conjunction with the vector construct. The 
construct vector(I := 0 .. n— 1, expr(I)) creates an n element vector whose ith element 
is the value of expr(i). We have seen an example use in the Iris model in Listing 3.1: 

i class .assignment (_ ) ~ discrete ( vector ( I := 0 .. n_classes — 1, phi ( I ) ) ) - 


This specifies that the probability of an entry in the class assignment vector being j 
is 0(j), i.e., 

VO < i < lcngth(class_assignment) : Pr ( class .assignment ( i ) = j) = 0(j) 

AutoBayes recognizes these operators in arithmetic expressions: +, -, *, /, ** (ex- 
ponentiation), sqrt, log, exp (e x ), sum(I := min .. max, arith_expr(I)), 
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Distribution 

Keyword 

Bernoulli 

x ~ bernoulli(p) 

Beta 

x ~beta(alpha, beta) 

Binomial 

x ~binomial(n, p) 

Cauchy 

x ~ cauchy(alpha, beta) 

Discrete 

x ~ discrete(p(_)) 

Dirichlet 

x ~ dirichlet (alpha) 

Exponential 

x ~exponential(lambda) 

Gamma 

x ~gamma(k, theta) 

Gauss 

x ~gauss(mu, sigma) 

Inverse Gamma 

x ~invgamma(k, invtheta) 

Mixture 

x ~mixture(E cases [vail — >distl, ...]) 

Poisson 

x ~ poisson(lambda) 

Uniform 

x ~ uniform (min_val .. max_val) 

vonMises 

x ~vonmises(N, mu, kappa) 

Weibull 

x ~weibull(alpha_val, beta.val) 


Table 7.1: List of Probability Distributions 


cond(bool_expr, true_arith_expr , false_arith_expr ), and expr cases [val — > distribution, ... ]. An 
example of the use of log for transforming data is given Listing 8.5 (Page 81). We 
have already seen in the Iris model in Listing 3.1 (Page 32) a use of the sum con- 
struct to specify the constraint that (p is a probability vector that sums to 1. The 
cond construct returns the value of the true_arith_expr if bool_expr evaluates to 
true; otherwise it returns the value of f alse_arith_expr. It is used in the random 
walk model in Listing 8.16 (Page 106) and in the change point detection model in 
Listing 8.17 (Page 107). The cases construct is used in the mixture model given in 
Listing 8.14 (Page 102). 

The AutoBayes goal expression is of the form: 
max pr( {varsl} | {vars2} )for { vars3 }. 

Data variables typically appear in varsl of the expression. The goal is to find values 
for the variables vars3 that maximize the conditional probability expression. 

Directives to AutoBayes, like pragmas, may be included in the model by writing 
pragma pragma=value. A list of directives is given in Appendix A. Arbitrary Prolog 
code may also be included in the model. It can be used to give hints to AutoBayes 
as to how to produce a closed-form solution to a subproblem. An example of this is 
given in Section 8.2.4 (Page 99). 
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Here we provide the set of keywords for the AutoBayes input model syntax. 


pragmas 

version header 

include 

comments 

distribution 

assignment 

vector declaration 

matrix declaration 

expression operators 

vector constructor 

types 

mode 

constraints 

goal 


pragma 

version 

include 

as ’comment’ 
expr ~ distribution 
var := value 
var(0 .. max_index) 

var(0 .. max_row_index, 0 .. max_col_index) 

+, — , *, /, **, sqrt, log, exp, sum, cond, cases 
vector(I := 0 .. n, expr(I)) 

double, int, nat 
const, data, output 
where 

max pr({data_l,...} | {param_l, ...} ) for { param_l, 


} 
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8. Statistical Models — Examples 


In this chapter, we will present a larger number of AutoBayes specifications from 
various application areas. The purpose of this chapter is twofold. On one hand, these 
examples have been collected to demonstrate the capabilities of the AutoBayes 
system. For several examples, we also present the detailed mathematical derivation of 
the problem, as it is automatically generated and typeset (in PTgX) by AutoBayes 
by specifying the -tex synt flag option. These derivations demonstrate that for 
the synthesis of a customized data analysis algorithm, often substantial amounts of 
mathematical derivations have to be performed before the problem can be solved — 
something traditional libraries cannot offer. 

The presented examples also show how compact and flexible the specification language 
for AutoBayes is. Table 8.2 (Page 118) lists all examples from this section, the 
lengths of their specifications, hie names (in the AutoBayes distribution), and the 
length of the generated C++ code (including comments). Although these numbers 
should be viewed with caution, they demonstrate AutoBayes’s excellent code-to- 
specihcation ratio. Also, the point is that an AutoBayes specification is a high- 
level, understandable description of a statistical problem, whereas the program is a 
low-level description of its solution; there are cases where the solution can be expressed 
succinctly in closed form, but only after a lengthy derivation. 

The second purpose of this chapter is to provide the user with a wealth of pre-dehned 
specifications that can be used, modified, and refined for a specific purpose. 


Notes 


• Automatically generated derivations are enclosed between begin autogener- 
ated document and end autogenerated document. 

• AutoBayes generates internal variables in many places, e.g., index variables. 
They have the syntax pvn, where n is a number. For example, pv65 is such an 
autogenerated variable name. 
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• AutoBayes’s Gaussian is defined with respect to standard deviation; i.e., both 
x~gauss(mu, sigma) and x~gauss(mu, sqrt(sigma_sq)) mean x ~ iV(/i, cr 2 ) . This is a 
deviation from the form used by Matlab™ and Octave. 

• In order to facilitate pretty-printing of the generated explanations, variables 
ending in _sq are assumed to be squares; e.g., sigma_sq is typeset as a 2 . This 
rule is for typesetting in BTf^X only and such variable names have no specific 
semantics. 

8.1 Introductory Examples 

8.1.1 Normal Distributed Data 

This very primitive example shows the basic input syntax for AutoBayes and demon- 
strates its symbolic calculation capabilities: given n data points xi, . . . , x n , which are 
Gaussian distributed with an unknown mean fi and variance cr 2 , i.e., Xi ~ N(n,<j 2 ), 
the task is to estimate the unknown /j, and a 2 . Listing 8.1 shows the AutoBayes 
specification for this problem. 

1 model normal as ’Normal Distributed Data’. 

2 

3 const nat n as ’NUMBER OF DATA POINTS ’ . 

4 

5 double mu as ’UNKNOWN mean’ . 

6 double sigma_sq as ’UNKNOWN VARIANCE ’ . 

7 where 0 < sigma_sq . 

8 

9 data double x(0..n— 1) as ’given data points’. 

10 

11 x(_) ~ gauss (mu, sqrt ( sigma_sq ) ) . 

12 

13 max pr( x | {mu, sigma_sq} ) for {mu, sigma_sq}) . 

Listing 8.1: AutoBayes specification for normal distributed data. 

Line 1 specifies the name of the model. AutoBayes will generate a function with 
the name normal. In Line 3, the number of data points is specified to be a constant. 
It has to be known during runtime. Lines 5-6 specify the unknown parameters fi and 
a 2 with the constraint that a 2 > 0. 

In Line 9, the input data variable x is declared. This vector of data is provided at 
runtime to the synthesized function. From this vector, the value of n is calculated 
automatically. 
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Line 11 describes in statistical terms the distribution of the data. The underscore 
_ indicates that each element of the data vector has the same distribution. Finally, 
Line 13 specifies the data analysis task: hireling the values of /i and a 2 that maximize 
the probability of the data given /i and a 2 . 


It is obvious that the expected result for this problem is the following two equations 
(this problem can be solved in closed form): 


ft 



^ J>,-/r) 2 

i = 1 


AutoBayes produces a piece of code which exactly contains these sums calculated 
with nested for loops. However, these equations are not hard-coded into the system; 
rather, a detailed derivation of these equations is produced. In the following, the main 
steps of the derivation are presented, exactly as they are produced by AutoBayes 
(with the option -tex synt) 1 : 


begin autogenerated document 

The conditional probability P(x \ n, a 2 ) is under the dependencies given in the model 
equivalent to 


— 1+n 

Y[ P(xi\n,a 2 ) 

i = 0 

The probability occurring here is atomic and can thus be replaced by the respective 
probability density function given in the model. This yields the log- likelihood function 


— 1+n 

log n exp 

?.=o 


i~\ {Xi 


V 


Lx 




/ 


1 


v^r (a 2 


) 


1 

2 


which can be simplified to 

1 1 1 2 
— - n log 2-| — -n log 7r H — -n log a + 

—1+n 

- 2 +T 1 + 

1=0 


1 The only changes applied were to break long formulas. AutoBayes automatically typesets a 
variable whose name is a Greek letter as the Greek letter, and a variable whose suffix is Lsq’ is 
typeset as the square of the prefix. 
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This function is then optimized w.r.t. the goal variables // and a 2 . 

The summands 

n log 2 

- ^ n log vr 

are constant with respect to the goal variables // and tr 2 and can thus be ignored for 
maximization. 

The factor 

1 

2 

is non-negative and constant with respect to the goal variables // and a 2 and can thus 
be ignored for maximization. 

The function 

— 1+n 

-1 n log a 2 + -1 (u 2 ) 1 ^(-l/i-fa;*) 2 

i=0 

is then symbolically maximized w.r.t. the goal variables fi and a 2 . The partial differ- 
entials 


— 1+n 

= -2 /in (a 2 ) 1 + 2 (a 2 ) 1 Xj 

i=0 
— 1+n 

= -1 n (ff 2 ) -1 + (ff 2 )~ 2 (-1 H + Xj ) 2 

i=0 

are set to zero; these equations yield the solutions 

—1+n 

H = n _1 Xi 

i= 0 
— 1+n 

a 2 = nT 1 (—1 /i + Xj) 2 

i=0 


df_ 

(9/i 

<9a 2 


end autogenerated document 
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8.1.2 Working with Priors 

A more Bayesian flavor can be given to the above example if we assume that we have 
some prior knowledge about one or both of the unknown parameters n or a 1 2 3 4 5 6 7 8 9 . 

1 model normal_known_variance as ’Normal model with prior on mean 

2 AND KNOWN VARIANCE ’ . 

3 

4 const nat n as ’NUMBER OF DATA POINTS ’ . 

5 

6 % Priors (hyperparameters & distribution) 

7 

8 const double mu_0 as ’prior on mu (mean of means) ’. 

9 const double tau_0 as ’prior on mu (variance of means)’. 

10 where 0 < tau_0 . 
n 

12 double mu ~ gauss (mu_0, sqrt ( tau_0 ) ) . 

13 

14 const double sigma.sq as ’SIGMA SQUARED’. 

15 where 0 < sigma_sq . 

16 

17 data double x( 0 ..n— 1 ) as ’given data points’. 

18 

19 x(_) ~ gauss(mu, sqrt ( sigma.sq ) ) . 

20 

21 max pr({x, mu} | sigma_sq) for {mu}. 

Listing 8.2: Specification for normal distributed data with priors. 


Listing 8.2 shows the specification for the case where the variance is given exactly, and 
we have some prior knowledge about the mean, i.e., /i ~ N(n 0 , sqrtijo)). This model 
has been adapted from [Bis95]. Again, AutoBayes finds the closed-form solution: 


a 


H = H o 


To 


(<t“ + n t 0 ) (cr 2 + ri-points tq) 


— l+n 

£ 


Xi 


i=0 


1 model normal as ’Normal model with CONJUGATE PRIORS ’ . 

2 

3 const nat kappa.O as ’number prior data points ’ . 

4 where 0 < kappa_0 . 

5 const double mu_0 as ’prior on mu (mean of means) ’. 

6 

7 double mu ~ gauss(mu_0, sqrt ( sigma_sq/kappa_0 ) ) . 

8 

9 const double sigma_0_sq as ’prior ON SIGMA_SQ ’ . 
where 0 < signra_0_sq. 


10 
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n const double delta_0 as ’degree of belief in sigma_0_sq ’ . 

12 where 0 < delta.O . 

13 

14 double sigma.sq ~ invgamma( delta.O /2+1 , sigma_0_sq * ( delt a_0 /2) ) . 

15 where 0 < sigma_sq . 

16 

17 const nat n_points as ’NUMBER OF data points ’ . 
is where 0 < n_points . 

19 

20 data double x ( 0 . . n_points — 1) as ’CURRENT data points (known)’. 

21 x(_) ~ gauss(mu, sqrt ( sigma.sq ) ) . 

22 

23 max pr({x, mu, sigma_sq}) for {mu, sigma_sq}. 

Listing 8.3: Specification for normal distributed data with conjugate priors 

Listing 8.3 shows the specification for the case, where conjugate prior information is 
available for fi and a s q. The priors are specified as the respective conjugate priors for 
the Gaussian distribution, i.e., [i is distributed as Gaussian itself, and a 2 is distributed 
as an inverse gamma. 

The prior on ju takes a constant juo as mean; this pri or me an has been established from 
Ko prior observations. The standard deviation \foT~K o is largely a mathematically 
convenient choice which in the end eliminates all dependencies between // and cr 2 . 
Other values should be possible as well. The prior on a 2 takes constants <7g and <5 0 as 
parameters where <5 0 can be considered to be the degree of belief that a 2 is the “right” 
variance. This model is adapted from the normal model with unknown parameters in 
[BS94, GCSR95], 

begin autogenerated document 

The joint probability P(/x, a 2 , x) is under the dependencies given in the model equiv- 
alent to 


— 1 + 71 

P(n\ a 2 )P(a 2 ) JJ P(ay|/i,a 2 ) 
*= o 


All probabilities occurring here are atomic and can thus be replaced by the respec- 
tive probability density functions given in the model. This yields the log-likelihood 
function 


( 

exp 


a 2 

^ (* 2 0* 2 J 

v y 




log exp 


2\-(l+l+§ A) 
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1 d<Wo) 1+i<5 ° -i+" i 

. y i I I exp T r 

V2^(a 2 K^)i r( i + 1 <5o) -• \ (ff 2 )“ 2 J V2^(* 2 )’ 

which can be simplified to 

-llogr(l + ^h 0 ) + ~ log a 2 + ~ log 2 + -^h 0 a 2 OT 1 + 

-^h 0 log 2 H — ^ ho loga 2 + -^K 0 (a 2 ) -1 (/x + -1 /i 0 ) 2 + n log2 + 

1 , 1 , 2 1 . 

--nlogn + ~-nloga +--log7r + 

1 111 

~2 ( a2 ) _1 (“ 1 /^ + a h) 2 + 2 5 ° lo § 5 o+ 2^o log <^0 + 2 log Ko + log h 0 + log cxq 

i = 0 

This function is then optimized w.r.t. the goal variables /i and a 2 . 

The summands 

-1 log T(1 + ^ h 0 ) 

3 

"2 ‘° g2 

- ^ h 0 log 2 

- log 2 

- ^ n log vr 

logTr 
^ h 0 log ho 
^ loga 2 

^ log «0 

log ho 
log^o 

are constant with respect to the goal variables /i and a 2 and can thus be ignored for 
maximization. 
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The factor 

1 

2 

is non-negative and constant with respect to the goal variables /i and a 2 and can thus 
be ignored for maximization. 


The function 

—5 log a 2 — 1 5o ctq (a 2 ) 1 + -1 5 0 log a 2 + -1 k 0 (a 2 ) 1 {/i + -1 /i 0 ) 2 + -1 n log a 2 + 

— l+n 

-i +T 1 y (-+ + + 2 

«=o 

is then symbolically maximized w.r.t. the goal variables fi and a 2 . The partial differ- 
entials 

— 1+n 

cr' 2 ) 1 + -2 /in (a 2 ) 1 + 2/to/io (a 2 ) 1 + 2 (a 2 ) 1 y 3+ 

j=0 

= -5 (a 2 ) -1 + -15 0 (o -2 ) 1 + -1 n (a 2 )' 1 + 5 0 (To (a 2 ) -2 + 

< 7(7 

— 1+n 

«o + 2 ) _2 + + -1 /i 0 ) 2 + (o- 2 ) -2 ( _1 ^ + 

j=0 

are set to zero; these equations yield the solutions 


d/i 


^ J o 

— = -2n 0 fi\ 


— 1+n 

li = Kofio +o + n ) _1 + +o + n)~ l y Xj 

i = 0 

= 5o ctq (5 + <fo + fi) + Ko (5 + 5q + n) 1 (|U H — 1 P-o) 2 + 

— 1+n 

(5 + 5 0 + n) _i y (-ifjL+Xi) 2 

i= 0 


end autogenerated document 
8.1.3 Combining Measurements 

This example was motivated by the introduction to Kalman filters presented in Section 
1.5 of [MAY79]. Suppose we have two imperfect measuring devices. Each is modeled 
as returning a Gaussian-distributed measurement with a known bias and standard 
deviation around the actual value. If a measurement is made with each, how should 
the two measurements be combined to obtain a better estimate of the true value? The 
AutoBayes model for this situation is given in Listing 8.4. 
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1 model biased.measurements as ’Estimate true value given two biased 

MEASUREMENTS ’ . 

2 

3 const double bias_l . 

4 const double bias_2 . 

5 

6 const double sigma_l . 

7 where 0 < sigma.l . 

8 const double sigma_2 . 

9 where 0 < sigma_2 . 

10 

n double mu. 

12 

13 data double x_l . 

14 data double x_2 . 

15 

16 x_l ~ gauss (mu + bias_l , sigma_l). 

17 x_2 ~ gauss (mu + bias_2 , sigma_2). 

18 

19 max pr( {x_l , x_2} | {mu, bias_l , bias_2 , sigma_l , sigma_2} ) for {mu 

}■ 

Listing 8.4: AutoBayes specification for two biased measurements 

AutoBayes is able to solve this problem symbolically, yielding 

Xia'l + x 2 a f — biasicr| — bias 2 a^ 

(rl + tr* 


begin autogenerated document 

The conditional probability P(x'i , | /i) is under the dependencies given in the model 

equivalent to 


P(xi | fl) P(X2 I n) 

All probabilities occuring here are atomic and can thus be replaced by the respective 
probability density functions given in the model. This yields the log-likelihood func- 
tion 


log exp 


~ (Ai - (bias i + p)) 2 ^ 


CTi 


(A - (bias 2 + p)) 2 \ 


exp 


/ 


(To 


<y i 


V2ti <7 2 V2n 
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which can be simplified to 

-1 log 2 + -1 log 7T + -1 log <j\ + -1 loga 2 + — ^ crfi 2 (x\ + -1 biasi + -1 /i) 2 

+ — ^ cr f 2 + -1 bias 2 + -1 /i) 2 

This function is then optimized w.r.t. the goal variable /i. The summands 

— 1 log 2 

— 1 log 7T 
1 logai 

-1 loger 2 

are constant with respect to the goal variable // and can thus be ignored for maxi- 
mization. The factor 

1 

2 

is non-negative and constant with respect to the goal variable /i and can thus be 
ignored for maximization. 

The function 


— 1 cr 1 2 (ay H — 1 bias i H — 1 /i) 2 -I — 1 a 2 2 (a; 2 -I — 1 bias 2 H — 1 /u) 2 
is then symbolically maximized w.r.t. the goal variable /i. The differential 

—2 bias i crfi 2 H — 2 bias 2 erf 2 H — 2 /i erf 2 -| — 2 [i erf 2 + 2 ay erf : 2 + 2 r 2 erf 2 
is set to zero; this equation yields the solution 

— 1 biasi er 2 (er 2 + er 2 ) -1 -f — 1 bias 2 er 2 (er 2 + er 2 ) -1 + ay (er 2 + cr 2 )^ 1 + er 2 (er 2 + er^) -1 

end autogenerated document 

As part of its optimization phase, AutoBayes generates code that computes common 
subexpressions only once, as is evident in the generated C/C++ code: 

pvO = sigma_2 * sigraa_2; 
pvl = sigma_l * sigraa_l; 

mu = (x_l * pvO + x_2 * pvl - bias_l * pvO - bias_2 * pvl) / (pvO + pvl) ; 
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8.1.4 Transformations: log-normal and square-normal 

Data which has undergone some transformations can be modeled in AutoBayes. 
A typical example is log-normal distributed data (Listing 8.5). Other examples for 
transformation of the input data are: 

• the log-it transformation. Here, the ratio of ay/(l — ay) is log-normal distributed. 
The corresponding AutoBayes line is as: 

log(x(I)/(l— x(I))) ~gauss(mu, sqrt(sigma_sq)). 

• the square transformation with a distribution. Here, the square of each data 
value, x % . is normal distributed. In AuToBAYES-syntax, we write 

x(I)**2 ~gauss(mu, sqrt(sigma_sq)). 

Note that the current version of AutoBayes only allows the user to specify a limited 
set of transformations. 

1 model log_normal as ’Log— NORMAL MODEL’. 

2 

3 ... % SPECIFICATION AS ’’NORMAL” ABOVE 

4 

5 data double x ( 0 . . n_points — 1) as ’CURRENT data points (known)’. 

6 log(x(_)) ~ gauss(mu, sqrt ( sigma_sq ) ) . 

7 

s max pr(x | {mu, sigma_sq}) for {mu, sigma_sq}. 

Listing 8.5: AutoBayes specification for log-normal distributed data. 


8.1.5 Other distributions: Cauchy 

Of course, data are not always Gaussian distributed. Rather, some other probability 
density function for the data (or even a mixture) is required. A typical data anal- 
ysis task which requires a non-Gaussian data model has been adapted from [Siv96], 
attributed to [Gul88]: 

“A lighthouse is somewhere off a piece of straight coastline [of given length] 
at a position light x along the shore and a distance light y out at sea. It 
emits a series of short highly collimated flashes at random intervals and 
hence at random azimuths. These pulses are intercepted on the coast by 
photo-detectors [each at position x t ] that record only the fact that a flash 
has occurred, but not the angle from which it came. Nfi ashes have so far 
been recorded at positions {x(f)}. Where is the lighthouse?” 

Listing 8.6 captures this problem and synthesizes code to estimate the position of the 
lighthouse. 
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1 model lighthouse as ’Lighthouse example [Sivia96] 

2 

3 const nat length as ’length of the shore’. 

4 const nat n.flashes as ’number of flashes’. 

5 

6 % 

7 % Priors (hyperparameters & distribution) 

8 % 

9 double light _x as ’x— position of the lighthouse’. 

10 double light _y as ’y— position of the lighthouse’. 
n 

12 light_x ~ uniform(— length /2 , length/2). 

13 light_y ~ uniform ( 0 , lengtli/ 2 ). 

14 

15 % 

16 % Data 

17 % 

is data double x ( 0 . . n.flashes - 1) as ’x— POSITIONS OF triggered sensors’. 

19 x ( _ ) ~ cauchy (light_x, light_y). 

20 

21 % 

22 % Goal 

23 % 

24 max pr(x | { light _x , lighty }) for { light _x , light _y } . 

Listing 8.6: Lighthouse example 


8.1.6 Discrete 

Typical examples with discrete probability distributions always include models of 
tossed coins. Tossing one coin with a bias can be modelled using the Bernoulli dis- 
tribution (Listing 8.7). A real- valued bias with values between 0 and 1 is defined to 
describe the bias. Since the coin is tossed only once, the value of head can only be 0 
or 1 (coin lands on head). This (somewhat degenerate) example obviously calculates 
the value of head as 1 if bias >0.5 and 0 otherwise. This example shows that Auto- 
Bayes can find simple closed-form solutions. Listing 8.8 shows the straight-forward 
generalization to tossing a biased coin n times. Note that in both specifications, the 
declaration of a scalar statistic variable (heads) and the definition of its pdf can be 
done in one statement. 

If the bias is not known, AutoBayes can estimate it using prior information. In this 
case, a beta-distributed prior on the value of bias is used. Its parameters are a prior 
on getting head or getting tails. Listing 8.9 shows the AutoBayes specification. 
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1 model biased.coin as ’Biased coin toss model’. 

2 

3 data double bias as ’bias towards heads’. 

4 where 1 > bias . 

5 where 0 < bias . 

6 

7 nat heads ~ bernoulli ( bias ) . 

8 where 1 > heads . 

9 where 0 =< heads . 

10 where heads in 0..1. 

11 

12 max pr(heads | bias) for heads. 

Listing 8.7: AutoBayes model for tossing a biased coin. 


1 model biased_coins as ’Biased coin toss model (multiple tosses)’. 

2 

3 const nat n as ’NUMBER OF TOSSES ’ . 

4 

5 data double bias as ’bias towards heads’. 

6 where 1 > bias . 

7 where 0 < bias . 

8 

9 % Head count 

10 nat heads ~ binomial(n, bias), 

n where n > heads . 

12 where 0 =< heads . 

13 where heads in 0..n. 

14 

15 max pr(heads | {n, bias}) for heads. 

Listing 8.8: AutoBayes model for repeatedly tossing a biased coin. 


1 model 

2 

3 const 

4 

5 const 

6 

7 const 

8 


biased .coins .prior as ’Biased coin toss model with prior’. 

nat n as ’number of TOSSES ’ . 

nat prior.tails as ’PRIOR COUNT OF TAILS’. 

where 0 < prior.tails . 
nat prior.heads as ’PRIOR COUNT OF heads’. 
where 0 < prior.heads. 


9 

io double bias as ’bias towards heads’. 
n where 1 > bias . 

12 where 0 < bias . 

13 

14 bias ~ beta( prior .tails , prior.heads). 

15 
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16 nat heads ~ binomial ( n , bias). 

17 where n > heads . 

is where 0 =< heads . 

19 where heads in 0..n. 

20 

21 max pr({heads, bias}) for {heads, bias}. 

Listing 8.9: AutoBayes model for tossing a biased coin with prior. 


8.2 Clustering Examples 

8.2.1 Mixture of Gaussians 


The “mixture of Gaussians” [EH81] specification is a good example of how a straight- 
forward and simple problem specification unfolds into a complex, iterative algorithm. 
Listing 8.10 shows the specification of the problem: n.points data points x t have been 
generated by n_classes different sources. The data from each source c are normal 
distributed, i.e., x t ~ N(/j, c ,a %). All parameters of the model, i.e., /i c , <r c , and the 
relative class frequency (j) c are unknown and must be determined. 


1 model mog as ’Mixture of Gaussians ’ . 

2 

3 % Model parameters 

4 const nat n.points as ’Number of data points ’ . 

5 where 0 < n .point s . 

6 const nat n_classes as ’Number of classes ’ . 

7 where 0 < n .classes. 

8 where n .classes <C n .point s . 


9 

10 

n 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


% Class probabilities 
double phi ( 0 . . n.classes — 1) . 

where 0 =sum(I := 0 .. n.classes— 1, phi(I))— 1. 

% Class parameters 
double mu(0 .. n_cl asses — 1) . 
double sigma ( 0 .. n.classes — 1) . 
where 0 < sigma(_). 

% Hidden variable 

output nat c ( 0 . . n.points — 1) as ’CLASS ASSIGNMENT vector’. 
c(_) ~ discrete ( vector ( I := 0 .. n.classes — 1, phi(I))). 

% Data 

data double x ( 0 .. n.points —1) . 

x(I) ~ gauss (mu( c ( I ) ) , sigma ( c ( I ) ) ) . 


26 
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27 max pr(x|{sigma, mu, phi}) for {sigma, mu, phi}. 

Listing 8.10: AutoBayes model for a mixture of Gaussians. 

The individual parts of the specification have already been described in Section 3.1 on 
page 31. Since this example is relevant for a large number of AutoBayes problems, 
the mathematical derivation and the assembly of the clustering algorithm will be 
discussed in detail. 

When AutoBayes processes this model, a number of internal steps are executed 
by AutoBayes in order to solve this optimization task. In a first step, a Bayesian 
Network (BN) for the problem is constructed. Figure 8.1 shows a graphical repre- 
sentation of the resulting Bayesian Network. AutoBayes generates this graphical 
representation whenever it is called with the -designdoc or -dot command flags. 



Figure 8.1: Bayesian Network for Mixture of Gaussians. This graph has been auto- 
generated by AutoBayes and had been visualized using GraphViz. 
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Statistical variables are shown as vertices in this network, known (data) variables 
as shaded ellipses. A hidden variable (in our case c) is displayed as a rectangle 
with rounded edges. Since all variables are actually vectors, their dimension and i.i.d 
status is shown with rectangles in the background, which are labeled with the variable 
dimension. As usual, arrows mark the dependencies between the variables. From this 
figure, it is obvious that the problem can be broken down into two subproblems, which 
can be solved separately: 

max pr(c | phi) for phi 

max pr(x | {c, mu, sigma}) for {mu, sigma} 

In its schema-base, AutoBayes contains a number of rules on how to partition and 
simplify extended Bayesian Networks. Furthermore, AutoBayes recognizes that this 
problem describes a discrete latent variable problem (often called a hidden variable 
problem). In our case, the (known) data variable is x, the hidden variable the class 
membership vector c. 

With these two observations, AutoBayes can now start to solve the problem. Being 
a hidden variable problem, the application of an instance of a discrete EM algorithm 
[MK97] can solve this problem. 

The model describes a discrete latent (or hidden) variable problem with the latent 
variable c and the data variable x. The problem to optimize the conditional probabil- 
ity P(x \ n, 4>, a) w.r.t. the variables /i, 0, and a can thus be solved by an application 
of the (discrete) EM-algorithm. 

The algorithm defines and maintains as central data structure, a class membership 
table q, such that q,-j is the probability that data point X; belongs to class j, i.e., 

Qij = p ([c; =j])- 

The algorithm consists of an initialization phase for q (Section 8.2.1), followed by a 
convergence phase, the EM loop, followed by the extraction of the hidden variable c. 

Initialization of EM 

The q matrix is of the size n_points x n.classes and must be initialized before the 
EM-loop starts. A number of different initialization methods can be selected using 
the pragma era: 

center In this mode, a center-based initialization is attempted. This means that 
for each class 0 < j < n_classes — 1, a center-point is chosen randomly, i.e., 
ctj = x rnc [ 2 ■ Then, all elements of q are initialized with the normalized distance 

2 Please note that this kind of initialization could result in data points could be picked more than 
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between the data value and the chosen point ctj. For 0 < i < n.points — 1 

~ x i ) 2 

( ^' 3 ~ v^n.points-i n— ^ 

J2k = =o V ( ct i “ x k) 

sharp_class This initialization starts with a random assignment of the class vector 
c, i.e., Cj = rnd for 0 < j < n.classes — 1. Then the q matrix is initialized 
such that qi Ci = 1 and zero everywhere else (0 < % < n.points — 1). This means 
that the EM algorithm starts with a q matrix that is zero almost everywhere. 

fuzzy_class This initialization also starts with a random assignment of the class 
vector c, i.e., Cj = rnd for 0 < j < n_classes — 1. Then, however, only a 
portion 6 of the probability is put into q lCi , the rest is uniformly (but randomly) 
distributed over the rest of the elements in q. In order to obey the constraint 
that ^ ~2 k qik = 1, the following algorithm is used: 

j Ci 
j = Ci 

ncvpref This option lets AutoBayes select one of the initialization methods. 

Note that there are many possibilities to initialize the q matrix. AutoBayes can be 
easily extended by schemas to perform other kinds of initialization. 

The EM Loop 

The EM loop is the central optimization iteration of the algorithm. Each iteration is 
comprised of two individual steps, the E-step and the M-step. The M-step maximizes 
the probability expressions and estimates values for the unknown parameters // , cr, 
and (j). The E-step calculates the “expectation” and updates the q matrix. In contrast 
to many other implementations, AutoBayes first starts with the M-step followed by 
the E-step. 

The iteration loop is executed until 

(a) a given maximal number of iterations has been performed. This number is given 
as an input parameter maxiteration to the generated code, OR 

(b) if the iteration metric E is smaller than the given parameter tolerance. 


l 

n.classes 


(1 — 5) x rnd 


% = 


1 _ 'r-^mclasses-i 

2-^k=§,k^j Qik 


once, potentially leading to numerical instability. 
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In any of these two cases, the EM loop is terminated and the required parameters are 
extracted and returned. 

Although a number of different ways to calculate the “error” E could be used, Au- 
toBayes currently supports two mechanisms. 

• The error is calculated as the sum of the normalized differences between all 
parameters of the current and the previous run. For run t, we have 


E 


n_classes-i 


£ 


"R 

1 

"R 

1 

__i_ 

1/4 1 

+ l/T 

I \ 


n_classes-i 


£ 




n_classes-i i 

-+ v B 

-^r 1 ! 

1*31 

+ 

i _t-i 
1 a 3 

i u m 

+ IC 1 


• The normalized difference of the log- likelihood L between the current and the 
previous run is taken, i.e., 

Ez I Lt - Lt ~ l \ 

\E\ + \ L*- 1 ! 

This iteration metric can be activated by setting 

-pragma em_log_likelihood_convergence=true. 

Although the effort in calculation of this metric is higher, the EM algorithm con- 
verges usually much faster. 

In the following, we present the autogenerated derivation of the M-step, as it contains 
“the meat” of the problem. 

begin autogenerated document 

The problem to optimize the conditional probability P(c, x j p,, 0, a) w.r.t. the vari- 
ables /i, 0, and a can under the given dependencies by Bayes rule be decomposed into 
two independent subproblcms: 


max P ( c | 0) for 0 
max P (x | c, /i, a) for /q a 


The conditional probability P(c j 0) is under the dependencies given in the model 
equivalent to 


— 1 +n -points 

n PN'W 

i = 0 

The probability occurring here is atomic and can thus be replaced by the respective 
probability density function given in the model. Summing out the expected variable 
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Cpvio yields the log- likelihood function 


— 1 -\-n -points 
-points — r 

log 

j G dom Ci~q(i,j) 




k = 0 


which can be simplified to 


— l+ii-classes — l+n_points 

Y i°g& Y 

i=0 j = 0 

This function is then optimized w.r.t. the goal variable <fi. 
The expression 

— 1 +n_classes — 1 +n_points 

Y log ^ Y q (- h *) 

i = 0 j=0 

is maximized w.r.t. the variable (j) under the constraint 

-l-\-n -classes 


o = — 1 + E 


i = 0 


using the Lagrange-multiplier 
The summand 


is constant with respect to the goal variable c f) pV 2i and can thus be ignored for maxi- 
mization. The function 

— 1 -\-n -classes — 1 -\-n -classes — 1 -\-n -points 

- 11 Y & + Y log (f)i Y 

i = 0 i=0 j = 0 

is then symbolically maximized w.r.t. the goal variable 4> pV 2i- The differential 

— 1 -\-n -points 

— l / + 4> pt ,2l E q(i,pv 21) 

i = 0 

is set to zero; this equation yields the solution 

— 1 -\-n -points 

r 1 Y q(i,pv2 1) 

i = 0 
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The conditional probability P (x \ c, p, a) is under the dependencies given in the model 
equivalent to 


— l+n -points 

]j[ P(xi\ci,n,a) 
2=0 


The probability occurring here is atomic and can thus be replaced by the respective 
probability density function given in the model. Summing out the expected variable 
Cpvio yields the log-likelihood function 


— 1 +n -points 

E i=0... — 1+ri-points — r 

log 1 I 

j G dom Ci~q(i,j) 


4 (x k ~Vc k ) 2 \ 


exp 


A:=0 


a 


Ck 


<j Ck V2n 


which can be simplified to 


— 1 +n_classes — 1 +n_points 

— 1 Y J log cTj q(j, i) 4 — - n _ points log 2 -| — - n _ points log7r + 

i = 0 j = 0 


1 

2 


— l+ri-dasses — 1 +n_points 

Y Y (-1 Hi + Xj) 2 q{j,i) 

i = 0 j = 0 


This function is 
The summands 
1 

— - n .points 
1 

— - n -points 


then optimized w.r.t. 

log 2 
log 7T 


the goal variables p and a. 


are constant with respect to the goal variables fi and a and can thus be ignored for 
maximization. 

Index decomposition: The function 


— 1 +n_classes —l+n-points — 1 +n_classes — 1 +n_points 

-i Y l °z a i Y q(j, i ) + - 2 Y Y (-!^ + x j) 2 q(j,i) 

2=0 j= 0 2=0 j=0 

can be optimized w.r.t. the variables /i t and a, element by element (i.e., along the 
index variable i) because there are no dependencies along that dimension. 
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The factor 


ri-dasses 

is non-negative and constant with respect to the goal variables p pV 3 i and <J pV 3 i and 
can thus be ignored for maximization. 

The function 

— l+n _points — 1 +n -points 

-1 log 0 p„ 3 i Y q(i,pv31) + --<7^3! Y (-Ippv3i+Xi) 2 q(i,pv31) 
i = 0 i=0 

is then symbolically maximized w.r.t. the goal variables p pv si and <J pV 3 i. The partial 
differentials 


df 

d f^pv 31 

df 

d(Jp V 31 

are set 


— 1 +n -points — 1 -\-n -points 

= -1 /-V,31 a p 2 n Y q(^P v31 ) + a pv3i Y x t q(i,pv31) 

i = 0 i=0 

— 1 -\-n -points — 1 -\-n -points 

= ~ l(7 pv3\ Y qO’P v31 ) + a pv31 Y (-Ippv3l+Xi) 2 q(i,pv3l) 

1=0 i = 0 

to zero; these equations yield the solutions 


— 1 -\-n -points 

Ppv 31 = cond( 0= Y <?(bpu31), 

i = 0 

fail( division .by. zero), 

— l+n -points — 1 -\-n -points 

Y q(i,pv3 1)- 1 Y x,q(i,pv 31 )) 

i= 0 i= 0 

— 1 -\-n -points 

cr pv 3 i = cond( 0 = Y^ qOiP v3 Oi 

i=0 

/ ail {division .by .zero ) , 

^ — 1 -\-n -points — 1 -\-n -points 

4 I Y (-lp pv3 i + Xi) 2 q(i,pv31)^ Y q(i,pv31)~^) 


i=0 


i=0 


end autogenerated document 


Extracting the ffidden Variable 

After the EM-loop has terminated, it has calculated the most likely values of the 
(unknown) parameters //, a, (f), as well as the (internal) matrix q. If the most likely 
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class assignment, the hidden variable c is desired, AutoBayes performs the following 
calculation. For 0 < i < n.points — 1 

Ci = argmaXj 

The q matrix itself is returned using the pragma -pragma em_q_output=true. Please 
note that the name of this internal matrix cannot be accessed from the AutoBayes 
specification language. 

8.2.2 Multivariate Mixture of Gaussians 

Whenever data sets with more than one statistical variable needs to be clustered, a 
multivariate mixture model has to be used. In AutoBayes, multivariate mixture 
models can be specified in two ways: 

• each individual variable is specified by its name; means and standard deviations 
for each variable are returned separately 

• all variables are packed into a 2-dimensional matrix. This matrix has the di- 
mension: N dimensions x N datapoints ■ AutoBayes then returns a matrix for the 
means and standard deviation of size N dimensions x A classes ■ 

A typical example for multivariate clustering, the Iris data set, has been discussed 
in detail in Section 3. There, we used the approach of packing the given data into a 
matrix “data". 

All multivariate mixture models can have variants with respect to the independence of 
the dimensions. In general, a multivariate Gaussian is defined as X ~ iV(/x, E) where 
fi is a vector of the means, and E is a N d i me nsions x Afdi mens ions matrix, the covariance 
matrix. Often, however, only the diagonal elements of the covariance matrix are of 
interest, i.e., E^ = 0 for i ^ j. Please note that the current version of AutoBayes 
cannot handle the case with full covariance, i.e., N(/i, E). 

Listing 8.11 shows the AutoBayes specification of a multivariate mixture of Gaus- 
sians. Its input is a data matrix sim_data of the dimensions number of data points 
times number of features (or variables). The goal of the specification is to estimate, 
given the desired number of classes n_classes , the most probable class assignment and 
the class parameters (/i,a 2 ). This model is the basis model for clustering of software 
simulation data as described in Section 1.2.1 and [GBSMB08]. 

Typically, this model is processed with the following flags and pragmas: 
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-instrument display and save the error value during each iteration of the EM al- 
gorithm. The data can be used to monitor the convergence behavior of the 
algorithm. 

-pragma em_log_likelihoocLconvergence=true This option forces the EM algo- 
rithm to use the current log-likelihood as its convergence metric. The algorithm 
stops, when the change in the log-likelihood becomes smaller than the given 
threshold tolerance. 

-pragma em= . . . selects the initialization routine for the EM algorithm as discussed 
above in Section 8.2.1. By default, a center-based initialization is selected. 

As with any iterative statistical or numerical algorithm, there are some possible 

caveats, when AutoBayes is used on mixture models. 

• The number of classes must be specified before the run of the EM algorithm. 
This model thus does not estimate the most probable number of classes. How- 
ever, with a simple Matlab™/Octave script, which iterates over a range 
of classes (e.g., 2:10) and monitoring of the returned log- likelihood, the best 
number of classes can easily be estimated. 

• If the given number of classes is larger than is prompted by the data, the al- 
gorithm may return multiples of classes with identical parameters. The reason 
is that the generated algorithm does not automatically reduce the number of 
classes in case they become empty. Reducing the number of classes avoids this 
problem. 

• Numerical instability and return of ’NaN’ can occur if the range of values in a 
data variable becomes too large. The reason is that internally, ratios of expres- 
sions of the form e x ~^ have to be formed for input data x. If the value of x — /i 
becomes too large or to small, NaNs or division-by-zero exceptions can occur. 

This problem can circumvented by normalizing the data prior to processing. 
Typical ways to do this in Matlab™/Octave using a data vector x is: 

— x n = produces a 0-1 normalized data vector x n . 

max(tc)— min(x) ^ 

— x n = produces a IV(0, 1) normalized data vector x n . 



1 model mult_cluster as 

2 ’simple multivariate clustering model’. 

3 

4 const nat n_variables as ’Number of variables’. 

5 const nat n_points as ’Number of data points ’ . 

6 
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7 const nat n_classes as ’Number of classes ’ . 

8 where 0 < n .classes. 

9 where n .classes <C n .point s . 

10 

n 

12 double phi ( 0 . . n.classes — 1) as ’class probabilities’. 

13 where 0 =sum(I := 0 .. n.classes — 1, phi(I))— 1. 

14 

is % Class parameters 

16 double mu( 0 . . n.variables — 1, 0 . . n.classes — 1) . 

17 

is double sigma ( 0 .. n.variables — 1 , 0 .. n.classes —1) . 

19 where 0 < sigma 

20 

21 output nat class .assignment (0 . . n.points — 1) as ’hidden variable’. 

22 class .assignment (_ ) ~ discrete ( vector ( I := 0 .. n.classes — 1, phi(I))). 

23 

24 data double sim.data ( 0 . . n.variables — 1, 0 . . n.points — 1) . 

25 

26 sim.data (C, I ) ~ gauss (mu(C, class .assignment ( I ) ) , sigma(C, 

class.assignment(I))) . 

27 

28 % Goal 

29 

30 max pr({ sim.data } | { phi , mu, sigma }) for { phi , mu, sigma }. 

Listing 8.11: Multivariate clustering of Gaussians 


8.2.3 Working with Priors 

As with other statistical models (see Section 8.1.2), mixture models can have priors. 
In this section, we will discuss how conjugate priors on the means and the standard 
deviations for each class can be used within AutoBayes (for a one-dimensional 
problem). Listing 8.12 shows the entire specification. Most of the specification is 
similar to the standard one-dimensional mixture of Gaussians. The priors are defined 
by the additional (known) variables (each variable is a vector over the classes): n o 
as the mean of the means, Kq as confidence of /i 0 , and Co and <5o as the prior for the 
standard deviation. Then, the model parameters /i and a 2 are distributed as: 

Id ~ N(n 0 , n 0 a 2 ) 

and 

a 2 ~ r _1 (l + ^cr 0 A 0 ) 

The goal of the specification now looks different, namely 
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i max pr({mu, sigma_sq , x} | phi) for {phi, mu, sigma_sq}. 


For comparison, the goal specification of the standard mixture model is 
i max pr(x|{phi, mu, sigma}) for {phi, mu, sigma}. 


The internal, autogenerated derivation is now getting much more involved, as all the 
information about the priors has to be taken into account. Below, we show the second 
half of the derivation, namely for maximizing P(/j,a 2 ,x\c) under the independence 
assumption, i.e., the class frequency (0) has been factored out already. 

begin autogenerated document 

The conditional probability P(/j,a 2 ,x \ c) is under the dependencies given in the 
model equivalent to 


-1 +C -1 +G -1+JV 

n p (/bi°f) n n p w 

2=0 2=0 2=0 

All probabilities occurring here are atomic and can thus be replaced by the respective 
probability density functions given in the model. Summing out the expected variable 
Cpvii yields the log-likelihood function 


ErAA~ J (. J ) i °gnd: c ^p 


(«o (Cfe) 5 ^ 2 


K 0 


2 ^ f V2^ 


n — 1 +c 

k = 0 ex P 




k) 1 


n — 1 -\-N 

k = 0 eX P 


~~2 I K a 

a k J 

2 ( Xk _ ^ c k ) 
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r(i + 2« 


cr; 


c*0 


V / 2 t r(cx 2 J 


which can be simplified to 
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— 1 C log H — 1 C log r(l + | ho) + 

C log ho + C loger 0 + -§ Y^=t C lo § a i + 

— | C log 2 H — 7 ; 5q C log 2+ 

-§ ^0 Eri^ o 2 ) -1 + -§ <*o Erio +c io g°f + -I c ^g^ 

-\N log 2 H \ N log7T + -|Ko 2 E“j^ C (M/x 0 + /^) 2 (<T 2 ) _1 + 

4E“i +c iog^ 2 Ei + %0'.0+ 

-I Ei^KT 1 E"= 1 o +JV (-i^ + ^) 2 ^,*)+ 

| C log h 0 + | h 0 <7 log cr 0 


This function is then optimized w.r.t. the goal variables /i and a 2 . 
The summands 


-1 C logK 0 

— 1 C log T(1 T ^ ho) 

C log h 0 

C log cr 0 

— ^ C log 2 

-i^Clog2 
C logTT 
~^N log 2 
-^N log 7T 
^ h 0 C' log ho 
^ h 0 C log (j 0 

are constant with respect to the goal variables /j and cr 2 and can thus be ignored for 
maximization. 

Index decomposition 


The function 
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-I E7io" c lo § °f + E7i 0 +c (*?) _1 + 

-|<^o Eiio^iogof + -|«o 2 E i i: c (-^o + mn^) x + 

-§ Eio +C iogo? E7i + %tt>0+ 

-I Ei +C (^) _1 E7= 1 o +JV (-i^ + ^) 2 ^'^) 

can be optimized w.r.t. the variables /i p „ 57 and <j 2 v37 element by element (i.e., along 
the index variable pv37 ) because there are no dependencies along that dimension. 

The function 

-5 logtf v37 + -15 0 <h) (<^ 3 t )" 1 + -!^o log ^37 + - 1 log ^37 q(i,pv37)+ 

-1 2 (-1 /i 0 + p pv37 ) 2 (o-J, 3 7 ) _1 + -1 (<7^ 37 ) _1 Er= 1 o ’ JV ( _1 ^37 + ^) 2 q(i,pv37) 

is then symbolically maximized w.r.t. the goal variables p pV 37 and cr 2 v37 . The partial 
differentials 


df 

dp P v37 


df 

do'pv 37 


-l+JV 


2 Ppv 37 ^0 2 (^ 37 ) + “ 2 Ppv37 (^ 37 ) 1 E q(i,pv 37) 


4 = 0 


-i+jv 


+2/i 0 k 0 2 (aj, 37 ) + 2 (ct 2 w37 ) q(i,pv37) 


4=0 


-l+JV 


5 (^ 37 ) X + -l Soi^sr) X + - 1 (^ 37 ) 1 q(hpv 37) 


4=0 


+ 5 0 °0 (^ 4 , 37 ) + K 0 2 (~1 To + P P v37 ) 2 (^ 37 ) 

-l+JV 

+ (^ 37 ) ■ 2 E (— 1 Ppv37 + Xi) 2 q(i,pv37) 

4=0 

are set to zero; these equations yield the solutions 

— l+JV -l+JV -l+JV 

p P v37 pO (Id - 4 E q(i,pv 37)) 1 + (1 + K o g(i,pu37))‘ - 1 E Xj q(i,pv 37) 

2=0 2=0 2=0 

-1+N 

cr p„37 = <Wo (5 + 5 0 + XI g(*>pv37)) -1 + «o 2 (5 + <J 0 

4=0 


-l+JV -l+JV 

+ ^ ?(bP' y37 )) _1 (- 1 J u o + ^37) 2 + (5 + (5 0 + ^2 q(hP v 37)) _1 

2=0 2=0 
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-l+JV 

x y] (-1 /i pv37 + Xj) 2 q(i,pv37) 

i= 0 

end autogenerated document 

1 model mgp_mu as ’Mixture of Gaussians (with priors)’. 

2 

3 const nat n_points as ’Number of data points ’ . 

4 where 0 < n.points . 

5 

6 const nat n_classes as ’Number of classes ’ . 

7 where 0 < n .classes. 

8 where n .classes <C n.points. 

9 
10 

n double phi ( 0 .. n.classes —1) . 

12 where 0 = sum(I := 0 .. n.classes— 1, phi (I))— 1. 

13 

14 

15 const double mu_0 as ’PRIOR ON MU’. 

16 const double kappa.O as ’PRIOR ON MU’. 

17 where 0 < kappa.O . 

18 

19 double mu(0 . . n .classes — 1) . 

20 mu(I) ~ gauss (mu.O, sqrt ( sigma.sq ( I ) ) * kappa.O). 

21 
22 

23 const double sigma.O as ’PRIOR ON SIGMA.SQ’. 

24 where 0 < sigma.O . 

25 const double delta.O as ’PRIOR ON SIGMA.SQ’. 

26 where 0 < delta.O. 

27 

28 double sigma.sq ( 0 . . n.classes — 1) ~ invganima( delta.O /2 + 1 , sigma_0*( 

delta.O /2) ) . 

29 where 0 < sigma_sq(_). 

30 

31 

32 nat c ( 0 . . n.points — 1) as ’CLASS ASSIGNMENT vector’. 

33 

34 c(_) ~ discrete ( vector ( I := 0 .. n.classes— 1, phi(I))). 

35 

36 data double x ( 0 .. n.points — 1) . 

37 

38 x( I ) ~ gauss (mu( c ( I ) ) , sqrt(sigma_sq(c(I)))) . 

39 

40 max pr({mu, sigma.sq, x} | phi) for {phi, mu, sigma.sq}. 


Listing 8.12: Mixture of Gaussians with priors. 
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8.2.4 Working with Non-Gaussian and Multiple Distributions 

Until now, we only considered mixture models, where data were Gaussian distributed. 
In many applications, this does not have to be the case. Typical examples include 
discrete features with binomial distribution, or sensor readings with an exponential 
distribution. In general, two cases can be distinguished: 

• different distributions along the different data dimensions. Here, different data 
sources have different distributions, but the distribution is the same for all drawn 
data. An example is a two-dimensional data set of temperature (Gaussian dis- 
tributed) and thermostat status (on/off, discrete binomial distribution). 

• data points have a different distribution according to the data source (=class). 
For example, a sensor returns data which is exponentially distributed. Back- 
ground noise (from the same sensor), however, is Gaussian distributed. The 
statistical model now is required to separate the “good” data from the back- 
ground noise. 

Both kinds of problems can be solved by AutoBayes, as we show in the following 
sections. 

Non-Gaussian mixtures 

The specification of a mixture model for non-Gaussian distributions looks very sim- 
ilar to one for Gaussian distribution (Listing 8.10). Only the line, specifying the 
distribution of the data has to be modified and, of course, the names and numbers of 
distribution parameters has to modified accordingly. Table 8.1 shows a list of available 
(and tested) distributions available. Please note that for some distributions, no closed 
form solution for the M step exists or can be found by the symbolic system. In these 
cases, a numerical optimization routine (usually a Nelder-Mead Simplex Algorithm) 
is instantiated. Other distributions, most notably the von Mises-Fisher distribution 
requires additional “help” to find a closed form solution. These expressions are given 
in the input specification, as Listing 8.13 shows. This hint to solution is taken from 
a published paper [BaJGS05], where this result is a major result of the paper. 

1 ... 

2 data double x ( 0 . . n_dim — 1 ,0 . . n_points — 1) . 

3 x ( J , I ) ~ vonmises ( n_dim , mu( J , c ( I ) ) , k(c(I))). 

4 

5 % 

6 assert ( synth_formula ([ k ( I ) , mu(_,_)]> Formula, Constraint, 

7 block ( 

8 local ( [ 

scalar (PV1, double, []) , 


9 
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Name 

Notation 

Closed Form 

Remarks 

Bernoulli 

x ~ bernoulli(p ) 

Y 


Beta 

x ~ beta(a, (5) 

N 

i 

Binomial 

x ~ binomial (n, p) 

Y 

2 

Cauchy 

x ~ cauchy (x, y ) 

N 

i 

Exponential 

x ~ exp( A) 

Y 


Gamma 

x ~ gamma(k , 6) 

Y 

k known 

Gamma 

x ~ gamma(k , 6) 

N 

i 

Gauss 

x ~ gauss(y , a 2 ) 

Y 


Poisson 

x ~ poisson(X) 

Y 


vonMises 

x ~ vonmises(y , k ) 

Y 

3 

Weibull 

x ~ weibull (a, (3) 

N 

i 


Table 8.1: Different distributions for mixture models. For some distributions, closed- 
form solutions are found by AutoBayes, for others not. 1 AutoBayes has to be 
called with -pragma schema_control_arbitrary_init_values=true to obtain iter- 
ative solution. 2 PATCHED version of AutoBayes necessary. 3 solution requires 
customized schema (see text). 


10 

11 

12 

13 

14 

15 

16 

17 series 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 


scalar (PV2, double, []) , 
scalar (R, double, []) , 
scalar (II , int , [ ] ) , 

scalar ( 12 , int , [ ] ) , 

vector(V, double, [dim(0,n_dim — 1)], []) 


assign (PV1, 0 , [] ) , 

for ( [ idx (II ,0 , n_clim — 1 ) ] , 

series ( [ 

assign ( select (V, [II]) , 0, []) , 
for ([idx (12, 0, n_points — 1)], 

assign.plus (select (V, [II]) , 

*([q(I2 , I) , x(Il , 12)]) ,[]) , 

[]), 

assign.plus (PV1, *([select(V, [II]), select (V, [II]) 
]) , []) 
h []) , 

[]) , 

assign (PV1, sqrt (PV1) ,[]) , 
assign (PV2, 0 , [] ) , 

for ( [ idx ( II , 0 , n_points — 1)], 

assign_plus (PV2, q(Il , I), []), 
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33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 



[])’ 

for ( [ idx (II ,0 , n_clim — 1) ] , 

assign (select (mu, [11,1]), select (V, [II]) / PV1, [] ) , 

[]), 

assign (R, *([PV1, PV2 ** (-1)]) , []) , 
assign ( select (k ,[ I ] ) , 

(( * ( [R, n_dim] ) - *([R, R, R] ) ) / 

(1 - * ( [R, R]))), [ 

comment ([ ’Approximation of k according to ’, 

’ [ Banerjeeetal03 ] . ’ ] ) ] ) 

],[]),[ 

comment ([’ Optimization of the expression ’, 
expr (Formula) , pp_nl , 

’under the constraints ’, expr ( Constraints ) , 

’has been given explicitely in the specification.’, 
’For a detailed derivation of this solution see ’ , 

’ [Banerjee et al 2003]. A (recursive) approximation ’, 
’for k is used : ’ , 

expr(k_i = (r*n — r * r * r ) /(I — r * r ) ) 


Listing 8.13: Expressions to support vonMises-Fisher distributions. 


Mixture of Distributions along Variables 

Using different distributions along the different variables can be specified in Auto- 
Bayes in a straight-forward way. Different variable names are used for the different 
distributions as in the following specification snippet 

1 data x_g(I) ~ gauss (mu( c ( I ) ) , sqrt ( sigma_sq ( c ( I ) ) ) . 

2 data x_e(I) ~ exponential ( lambda ( c ( I ) ) ) . 

3 ... 


Then the goal statement looks like 

max pr({x_g,x_e}|{phi, mu, sigma, lambda} for {phi, mu, sigma, lambda}. 

A typical example, where this kind of models is necessary, when the data set consists 
of measurements from different sensors: some of the measurements are Gaussian dis- 
tributed (e.g., pressure), whereas others are discrete, e.g., valve-open, switch-position. 
For example, for a rocket the current thrust might be Gaussian distributed, whereas 
the boolean flag “motor on/off” is certainly not. In many cases, a boolean variable (or 
a discrete variable in general) can be modelled as a Gaussian by adding Gaussian noise 
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to it: x ab (I) = x(I) + normal rn d(0,<Jn Oise ). A more concise and correct specification, 
however, would directly define the input data vector x using a binomial distribution, 
where the probability p is unknown and different for each class c: 

i x(I) ~ binomial (1, p(c(I))). 

Mixture of Distributions along Classes 

For this specification variant the AutoBayes construct cases is used. The follow- 
ing specification snippet shows the central part of a mixture of beta and Gaussian 
distributed data. The full specification is shown in Listing 8.14. 

1 data nat x ( 0 . . n_points — 1 ) . 

2 

3 x ( I ) ~ mixture(c(I) cases 

4 [ 0 — > beta (a , b) , 

5 1 — > gauss (mu, sigma) 

]h 

In this example, we have two classes. Data belonging to Class 1 are Beta distributed 
with parameters a and b and those, belonging to Class 2 are Gaussian distributed. 

Please note that for this specification, set the pragma scheraa_control_arbitrary_init_values 
to true in order to allow AutoBayes to produce a numerical solution. 

1 model mix_beta_gauss as ’Disjoint Mixture between Beta and Gaussian’. 

2 

3 const nat n_points as ’Number of data points ’ . 

4 where 0 < n_ points . 

5 

6 % Mixing proportions 

7 double phi . 

8 where 0 < phi . 

9 where phi < 1 . 

10 

n % Parameters 

12 double a . 

13 where 0 < a . 

14 double b . 

15 where 0 < b . 

16 

17 double mu. 
is double sigma . 

19 where 0 < sigma . 

20 

21 % Hidden variable 

22 nat c ( 0 . . n_points — 1) ~ bernoulli ( phi ) as ’CLASS ASSIGNMENT vector’. 
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Listing 8.14: Mixture of Betas (Class 0) and Gaussians (Class 1). 


8.2.5 Multinomial Principal Components Analysis (MPCA) 

This is a fc- means version of the algorithm, without sparse vectors. For details 
see [BFG03]. 

1 model mpca as ’Multinomial Principle Component Analysis (PCA) ’ . 

2 

3 const nat n_points as ’Number of points’. 

4 where 0 < n_ points. 

5 const nat n.classes as ’Number of classes ’ . 

6 where 0 < n .classes. 

7 where n .classes <C n_points. 

8 const nat n_feats as ’Number of words in each document’. 

9 where 0 < n.feats . 

10 const nat n.words as ’Number of distinct words’. 
n where 0 < n_words . 

12 

13 const double alpha ( 0 .. n.classes — 1) as ’dirichlet parameters’. 

14 

15 % 

16% PARAMETERS: DISTRIBUTION OVER WORDS 

17 % 

is double omega (0..n_classes — 1 , 0. . n .words — 1) . 

19 where 0 =< omega 

20 where 1 = sum(J := 0 .. n_words — 1 , omega ( _ , J ) ) . 

21 

22 % 

23 % HIDDEN VARIABLE M THE TOPIC PROPORTIONS 

24 % 

25 double m(0.. n .points — 1 ,0.. n.classes —1) . 
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26 where 0 =< m( _ , _ ) . 

27 where 1 =sum(J := 0 .. n .classes — 1 ,m( J) ) . 

28 

29 % 

30 % HIDDEN VARIABLE K THE ASSIGNED TOPIC FOR EACH WORD 

31 % 

32 nat k ( 0 . . n.points — 1 ,0. . n.feats — 1) . 

33 where 0 =< k ( _ , _ ) . 

34 where k(_,_) < n .classes. 

35 

36 k ( I , _ ) ~ discrete(vector(J := 0 .. n.classes — 1 ,m( I , J ) ) ) . 

37 

38 % 

39 % DATA 

40 % 

41 data double x ( 0 . . n.points — 1 ,0. . n.feats — 1) . 

42 where 0 =< x ( _ , _ ) . 

43 where x(_,_) < n.words — 1. 

44 

45 x ( I , J ) ~ discrete ( vector (K := 0 .. n.words — 1 , omega ( k ( I , J ) ,K) ) ) 

46 

47 max pr ({x} | {nr, omega}) for {m, omega}. 

Listing 8.15: AutoBayes model for MCPA 
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8.3 Time Series Analysis 

The models discussed so far, are time independent. In nature and engineering, how- 
ever, many statistical processes are time series. This means they are of the form 
x to , . . . , x t , x t +A t, • ■ where t is the time, which is incremented in discrete steps of At. 
Please note that there are many different kinds of time series. Here, we only discuss 
discrete time series. 

Typical examples for discrete time series are sequences of events, or sequences of 
(noisy) sensor measurements made at a certain sampling rate. Typical data analy- 
sis problems are concerned with the detection of basic model parameters (and their 
change over time). 

Although the underlying mechanism of Bayesian Networks (BN) is very powerful 
to handle time series data, AutoBayes’s capabilities in handling of such data is 
currently fairly restricted. The following sections illustrate which analyses on time 
series can be performed with AutoBayes. 

The most striking restriction for AutoBayes is that all data have to be processed in 
batch mode, i.e., all data x to , . . . , x t , . . . , must be presented to the generated code 
as one vector of data. In contrast, recursive algorithms can process each data value ay 
individually at a given time. Thus, recursive data analysis algorithms are much more 
amenable to processing streaming data, i.e., data coming in at a specific rate. 

8.3.1 Random Walk 

A simple random walk can be described by the following equations: 

^o = 0 

x t = x t - 1 + r] for t > 0 

For a standard random walk, r/ is Gaussian distributed as r] r-'w' iV(0,<7 2 ). A biased 
random walk is described by a similar set of equations. However, a noisy drift factor 
pushes the values of x t toward a specific “direction” . We have 

^o = 0 

x t = x t -i + b + v for t > 0 


where b is the bias and v ~ A(0,u 2 ). 

With this information, we can construct the AutoBayes model shown in Listing 8.16. 
In this model, a random walk with mpoints data points is processed. The aim of the 
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model is to estimate the drift drift_rate and the noise drift -error given the data drift . 
Thus the goal of the specification is given as 

1 max pr( { drift } j { drift- rate, drift_error} ) 

2 for { drift_rate , drift_error }. 


The distribution of the data is specified exactly as shown in the equations above. Here, 
the cond keyword is used to distinguish the first data element from the subsequent 
ones. 

i drift(I) ~ gauss (cond ( I>0 , dr i f t ( I — 1) ,0 )+dr ift _r at e , drift-error ). 


1 model walk as ’simple random walk with bias’. 

2 

3 const nat n_points . 

4 where 0 < n_points . 

5 where n_points > 1. 

6 

7 % PARAMETERS 

8 double drift_rate as ’rate of drift per time splice’. 

9 double drift-error as ’standard deviation of drift per time slice’. 

10 where drift_error > 0. 
n 

12 % distribution 

13 data double drift ( 0 . . n_points — 1) as ’drift OR GYRO ANGULAR error’. 

14 d r i ft ( I ) ~ gauss (cond (I>0, drift (I — l),0) + drift_rate , drift-error ). 

15 

16 

17 max pr( { drift } | { drift- rate, drift_error} ) 

is for { drift_rate , drift-error }. 

Listing 8.16: AutoBayes model for a biased random walk. 


8.3.2 Change Point Detection 

An important task in the analysis of time series is the detection of an abrupt change. 
Here, the most probable index t , 0 < t < N is estimated where a change occurs. 
Classical examples include accident rates in coal mining, which changes abruptly 
after a new safety measure has been introduced, or estimating the point in time when 
sensor readings go bad. 

Listing 8.17 shows the AutoBayes specification (adapted from [OF96]) to detect the 
most probable change point. Here the process is described by 

_ f N(n i,cr 2 ) for t < t sw 
\ N(n 2 , a 2 ) for t > t sw 
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In this simple model, the (unknown) noise remains constant over the change. 

1 model hinckley as ’Gaussian change point analysis’. 

2 

3 const nat n_points . 

4 where 0 < n_points . 

5 

6 nat switchpt . 

7 where switchpt in 1 . . n_points — 2. 

8 % 

9 % The following constraint are implied by the range constraint but 

not 

10 % YET INFERRED . . . 

11 % 

12 where 0 < switchpt. 

13 where switchpt < n _ points — 1. 

14 where switchpt < n_ points. 

15 

16 double mul . 

17 double mu2 . 

18 

19 double sigma.sq . 

20 where 0 < sigma_sq . 

21 

22 data double x ( 0 . . n_points — 1) . 

23 

24 x(I) ~ gauss(cond(I < switchpt , mul, mu2) , sqrt ( sigma_sq ) ) . 

25 

26 max pr(x|{mul, mu2, sigma_sq , switchpt}) for {mul, mu2, sigma_sq , 

switchpt}. 

Listing 8.17: AutoBayes model for a simple detection of a change point 

Of course, variations of this change-point detection model can be specified. In the 
following, we will just mention some ideas and leave the exact specification as an 
exercise for the reader. 

Instead of a change in the mean value n, the noise characteristics a 2 can change 
abruptly. This can be the case if a sensor suddenly produces a large amount of noise 
(e.g., due to a broken cable or damaged amplifier. Then, we have 

_ f N(n, of) for t < t sw 
\ N(n, of) for t > t sw 

Also, the detection of a change-point from a constantly growing value to a constant 
value can be specified easily. A practical example for such a specification is the 
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detection of the CAS-mach transition in aircraft trajectory (Chapter 1.2.2). Here, the 
values of the data stream are growing with a constant (but unknown) rate x r , until it 
switches over to a constant (again unknown) value x c . The mathematical formulation 
for this problem is (assuming the noise a 2 is constant and known). 

_ f N(x c + x r (t - t sw ),a 2 ) for t < t sw 
\ N(x c , a 2 ) for t > t sw 

A full specification of this problem will be given in the next section, where we will 
discuss finding the most probable change point in two statistical process variables. 

8.3.3 Change Points in Multiple Variables 

The detection of CAS-mach transition as discussed in Chapter 1.2.2 calls for a monitor- 
ing of two variables: the calibrated air speed (CAS) and the mach number. Listing 1.3 
on page 19 shows how these variables develop, when the aircraft is climbing: before 
the change point t 0 , the aircraft climbs with a constant (but noisy) airspeed cas 0 . 
Due to the physics of the atmosphere, at the same time, the mach number increases 
linearly with an unknown rate mach r . After the aircraft passed the (unknown) tran- 
sition point to, the mach number will remain constant macho and the airspeed will 

for t < to 
for t > to 
t 0 ) for t < t 0 
for t > to 

In order to obtain the most likely unknown parameters, we have to optimize for all 
parameters simultaneously by 

max Pr((cas t , mach t ) [macho, mach r , caso, cas r ) 

Listing 8.18 shows the full specification for this problem. 

1 model climb_transition as ’MAG->CAS transition for climb SCENARIOS’. 

2 

3 const nat n_points . 

4 where 0 < n_points . 

5 

6 nat t _0 . 

7 where t_0 < n_points — 1. 

8 where 2 < t_0. 


decrease linearly. We obtain the two formulas: 


CCLSt — 


macht = 


cas o 

caso — cas r (t — to) 
macho ~ mach r (t - 
macho 
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9 where t_0 in 3 . . n.points — 3. 

10 

n double mach_0 . 

12 double mach.r . 

13 double cas_0 . 

14 double cas_r . 

15 

16 const double sigma_sq . 

17 where 0 < sigma_sq . 

18 

19 data double mach ( 0 . . n_points — 1) . 

20 data double cas ( 0 . . n_points — 1) . 

21 

gauss ( 

cond (I < t_0 , 

mach_0 — ( I— t _0 ) *mach_r 
mach_0 


22 mach ( I ) 

23 

24 

25 

26 

27 

28 cas (I) 

29 

30 

31 

32 

33 

34 

35 max pr ({mach , cas } | { mach_r , mach.O , cas_0 , cas_r , t _0 } ) 

36 for { mach_0 , mach_r , cas_0 , cas_r , t _ 0 } . 


), 

sqrt ( sigma.sq ) ) . 

gauss ( 

cond (I < t _0 , 
cas_0 , 

cas_0 — (I— 1_0 ) * cas.r 

), 

sqrt ( sigma_sq ) ) . 


Listing 8.18: AutoBayes specification for the detection of the CAS-mach transition 


The heart of the algorithm, which is generated by AutoBayes is code for solving 
a large and complicated quadratic equation. By abbreviating subexpressions, which 
occur more than once, the code can be kept compact. If additional information about 
the unknown parameters are known, we can add prior information to the specification. 
Listing 8.19 shows that only few lines of the specification have to be added. Basically, 
we specify that the unknown parameters cas r and mach r are Gaussian distributed 
around some known mean Hmach r , Hca Sr and a certain confidence n m ach r i K cas r - 

1 model climb_transition_prior as 

2 ’MAG->CAS TRANSITION FOR CLIMB SCENARIOS WITH PRIORS ’ . 

3 

4 const nat n_points . 

5 where 0 < n_points . 

6 

7 nat t _0 . 

8 where t_0 < n_points — 


1 . 



110 


Statistical Models — Examples 


9 where 2 < t_0. 

10 where t_0 < n_points . 

n where t_0 in 3 . . n_points — 3. 

12 

13 double mach_0 . 

14 double mach_r . 

15 const double mu_mach_r . 

16 const double kappa_mach_r . 

17 where 0 < kappa_mach_r . 

18 

19 mach_r ~ gauss ( mu_mach_r , sqrt ( sigma_sq )* kappa_mach_r ) . 

20 

21 double cas_0 . 

22 double cas_r . 

23 const double mu_cas_r . 

24 const double kappa_cas_r . 

25 where 0 < kappa_mach_r . 

26 

27 cas_r ~ gauss ( mu_cas_r , sqrt ( sigma_sq )* kappa_cas_r ) . 

28 

29 const double sigma_sq . 

30 where 0 < sigma_sq . 

31 

32 data double mach ( 0 . . n_points — 1) . 

33 data double cas ( 0 . . n_points — 1) . 

34 

35 macli(I) ~ gauss ( 

36 cond (I < t_0 , 

37 mach_0 — ( I— t _0 ) *mach_r , 

38 mach_0 

39 ) , 

40 sqrt ( sigma.sq ) ) . 

41 cas ( I ) ~ gauss ( 

42 cond (I < tO , 

43 cas_0 , 

44 cas_0 — ( I— t _0 ) * cas _r 

45 ) , 

46 sqrt ( sigma_sq ) ) . 

47 

48 max pr ({mach , cas , mach_r , cas_r } | { mach_c , cas_c , switch_pt}) 

49 for {mach_c, mach.r , cas_c , cas_r , switch_pt}. 

Listing 8.19: Specification for the detection of the CAS-mach transition with priors 
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8.3.4 Kalman Filters 

Kalman filters are recursive least-square algorithms for the estimation of a process 
state, given a noisy process model and noisy measurements [GA01, BH97]. The auto- 
matic generation of code for Kalman filters is the domain of AutoFilter [TOMS04], 
However, simple variants of Kalman filters can be easily specified using AutoBayes. 
One restriction, however, should be noted: traditionally, Kalman filter algorithms 
work on-line, i.e., they process one measurement or temporal update step at at time. 
AutoBayes can only generate batch-mode filters, where all data of the time series 
is present at the same time (given as a vector to the algorithm). 

1 model kalman as ’Simplest possible Kalman filter’. 

2 

3 const nat n_points . 

4 where 0 < n_points . 

5 

6 data double likelihood . 

7 

8 double drift (O..n_points— 1) as ’drift’. 

9 drift (I) ~ gauss (cond( I>0 , drift (I— 1), 0), 

10 cond(I>0, 2.0, 1.0)). 
n 

12 % QUANTITY BEING PREDICTED, THE ‘‘NEXT” POINT 

13 double drift_next as ‘future drift’. 

14 DRIFT -NEXT ~ GAUSS ( DRIFT ( N-POINTS -1) , 1.0). 

15 

16 CONST double MEAS.error AS ’std.dev. of measurement’. 

17 

is % Gaussian observation (measurement) 

19 data double meas ( 0 . . N_POlNTS — 1) AS ’measurement’. 

20 MEAS ( I ) ~ GAUSS (DRIFT ( I ) , MEAS_ERROR) . 

21 

22 % QUANTITY BEING PREDICTED 

23 DOUBLE MEAS-NEXT AS ’future measurement’. 

24 MEAS-NEXT ~ GAUSS ( MEAS ( N-POINTS — 1 ) , MEAS_ERROR) . 

25 

26 MAX PR( { DRIFT-NEXT, MEAS_NEXT , DRIFT } | { MEAS } ) 

27 FOR { DRIFT-NEXT , MEAS-NEXT , DRIFT } . 

Listing 8.20: AutoBayes model for a simple Kalman filter. 

Listing 8.20 shows a specification for an extremely simple Kalman filter to estimate 
the drift rate of a gyro, for example. The drift is defined as a Gaussian random walk 
(vector), here just with constant a 2 of 1 and 2. Note the usage of cond to specify 
the recursive equation. The estimation of the next drift value in time drift-next is 
again Gaussian distributed, as well as the measurements. All unknown parameters, 
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the estimated sequence of the drift values as well as the estimated next drift and 
next measurement, are calculated, given the sequence of measurements in form of a 
matrix. Please note, that AutoBayes does not generate online recursive algorithms 
for Kalman filters. 

8.3.5 Kalman Filters with Failure modes 

A Kalman filter can also detect if and when sensors (or combination of sensors) fail, 
even if this failure is not directly observable. Listing 8.21 shows a somewhat compli- 
cated example of an aircraft sensor suite consisting of a directional and a pitch gyro 
as well as a turn indicator. The AutoBayes specification estimates the next states 
of the system and the most likely failure points for the individual sensors. 

1 model dgtc as ’directional and pitch gyro plus turn coordinator model’ 

2 

3 const nat n_points . 

4 where 0 < n_points . 

5 

6 data double likelihood . 

7 

8% DIRGYROJAILURETT = change point for direct, gyro failing 

9 % PARAMETERS 

io const double dirgyro_failure_prob as ’probability of failure of dir. 
GYRO ’ . 

n dirgyro_failure_prob := 0.5. 

12 nat dirgyro_failure_pt. 

13 where dirgyro_failure_pt in 1 . . n .points. 

14 where dirgyro_failure_pt < n_points+l. 

15 where 0< dirgyro_failure_pt . 

16 dirgyro_failure_pt ~ discrete ( vector ( I := 1 .. n_points , 

17 cond ( I<n_points , dirgyro_failure_prob /( n_points — 1) , 

is 1.0— dirgyro_failure_prob))). 

19 

20 % TG F A 11. 1 IRE PT = change point for turn COORDINATOR failing 

21 % PARAMETERS 

22 const double tc_failure_prob as ’probability of failure of turn 

coordinator ’ . 

23 tc_failure_prob : = 0.5. 

24 nat tc_failure_pt . 

25 where tc_failure_pt in l..n_points. 

26 where tc_failure_pt < n_points+l. 

27 where 0<tc_failure_pt. 

28 t c _f a i 1 ur e _p t ~ discrete ( vector ( I := 1 .. n_points , 

cond ( I<n_points , tc_failure_prob /(n.points— 1),1.0 — 

tc_failure_prob))) . 


29 
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30 

31 

32 % ELECTTAILUREJT = change point for electrical failing 

33 % PARAMETERS 

34 const double elect _failure_prob as ’probability of failure OF 

electrical ’ . 

35 elect.failure.prob := 0.5. 

36 data nat elect _failure.pt . 

37 where elect _failure_pt in 1 . . n_points . 

38 where elect_failure_pt < n_points+l. 

39 where 0 < elect_failure_pt. 

40 elect _failure_pt ~ discrete ( vector ( I := 1 .. n.points, 

41 cond( I<n .points , elect _failure_p rob /(n.points —1) , 

42 1.0— elect.failure.prob))). 

43 

44 % DRIFT = BIASED RANDOM WALK FOR BOTH DIRECT . AND PITCH GYROS 

45% PARAMETERS, INCLUDING THE BASE CASE 

46 const double drift.rate as ’rate of drift per time slice’. 

47 drift.rate := 0.1. 

48 const double drift .error as ’standard deviation of drift per time 

slice ’ . 

49 where drift.error > 0. 
so drift.error := 1.0. 

51 const double drift.base as ’drift initialization’. 

52 drift.base := 0. 

53 const double drift.xbase.error as ’drift initialization of standard 

ERROR ’ . 

54 drift.xbase.error := drift.error * 2 . 

55 

56 % DISTRIBUTION WITH BASE CASE AND RECURSIVE CASE 

57% EXPLICITLY DEFINED USING COND ( ) 

58 double dirdrift ( 0 . . n.points — 1) as ’drift OR dirgyro ANGULAR error’. 

59 dirdrift(I) ~ gauss (cond ( I>0 , d i r d r i f t ( I — 1) , dr i f t _b as e ) + d r i f t _r at e , 

60 cond( I>0 , drift .error , drift.xbase.error )) . 

61 

62% QUANTITY BEING PREDICTED, THE ” NEXT” POINT 

63 double dirdrift .next as ’future dirdrift’. 

64 dirclrift.next ~ gauss ( cl i r d r i ft ( n.points — l)+dri ft _r at e , drift.error). 

65 

66 

67 % ANGLEDIFF = difference of ANGLE 

68 % DISTRIBUTION 

69 double anglediff ( 0 .. n.points — 1) as ’angle’. 

70 anglediff(I) ~ gauss (cond ( I>0 , angledi ff ( I — 1) , 0 ) , 0.02). 

71 

72 % QUANTITY BEING PREDICTED 

73 double anglediff .next as ’future angle difference’. 

74 anglediff .next ~ gauss ( anglediff ( n.points — 1) , 0.02). 



114 


Statistical Models — Examples 


75 

76 

77 % ANGLE = UNBIASED RANDOM WALK 

78 % DISTRIBUTION 

79 double angle ( 0 .. n_points — 1) as ’angle’. 

so angle(I) ~ gauss (cond ( I>0 , angle ( I — l)+angledi f f ( I — 1) , 0 ) , 0.01). 

81 

82 % QUANTITY BEING PREDICTED 

83 double angle.next as ’future angle’. 

84 angle.next ~ gauss ( angle ( n.points — l)+anglecliff ( n.points — 1) , 0.01). 

85 

86 

87 % DIRGYRO = GAUSSIAN CONDITIONED ON ANGLE AND DIRDRIFT ; 

88 % FAILURE CAUSES THE DIRGYRO TO BE STUCK AT LAST GOOD DATA 

89 

90 const double gyro.error as ’error rate for GYROS’. 

91 gyro .error := 0.1. 

92 

93 const double tc.error as ’error rate for turn coordinator’. 

94 

95 data double dirgyro ( 0 .. n.points —1) as ’directional gyro measurement’. 

96 dirgyro(I) ~ gauss (cond ( and ([ I<dir gyro .failure _pt ]) , 

97 angle ( I ) + dirdrift ( I ) , 

98 angle(dirgyro_failure_pt— l)+dirdrift( 

dirgyro_failu re.pt — 1) 

99 ) , 

100 gyro .error ) . 

101 
102 

103 % TURN.COORD = gaussian conditioned on anglediff 

104 

105 data double turn.coord ( 0 . . n.points — 1) as 

106 ’TURN COORDINATOR — RATE OF CHANGE OF BANK’. 

107 

108 turn.coord ( I ) ~ gauss (cond ( and ([ I<t c .fail ure _pt , I<e le c t _f a i 1 u r e _p t ] ) , 

109 anglediff(I), 

no 0) , 

in tc .error). 

112 

113 max pr( { dirgyro_failure.pt , tc_failure.pt , 

114 dirgyro , turn.coord , angle.next , dirdrift .next , 

angled iff. next} 

115 {elect_failure.pt} ) 

116 for { angle.next , clirdrift.next , anglediff .next , 

dir gyro.failu re.pt , 

117 tc_failure.pt }. 

Listing 8.21: AutoBayes model for a Kalman filter with sensor failures. 
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8.4 Reliability Models 

Statistical models in software engineering can be used to model failure rates, rnean- 
time-between-failure, and reliability of a piece of software. The models described in 
this section comprise the standard models found in the literature. The AutoBayes 
models have been developed by B. Fischer. 

Listing 8.22 shows the specification for the basic Jelinski/Moranda (a.k.a., de-eutro- 
phication) model [JM72], In this model, the elapsed time between failures is modeled 
by an exponential distribution; the hazard rate is assumed to be proportional to the 
number of errors remaining in the software and to be constant between two consecutive 
failures. The Jelinski/Moranda model is thus a finite failure, concatenated failure rate 
model. 

The error repair process is modeled as immediate and perfect, i.e., after each failure 
the responsible error is identified and repaired and testing resumes only after the 
repair, and attempts to repair an error are always successful and do not introduce 
new errors. 

Listing 8.22 is a close transcription of the original model, only the parameter loc as 
number of lines of code has been added. Its purpose is to limit the number of errors 
(to be less than the number of lines in the code). 

Listing 8.23 is a Jelinski-Moranda model with conjugate priors as described in [MS83]. 

The Goel-Okumoto model ([G078], Listing 8.24) modifies the basic Jelinski-Moranda 
model by introducing a parameter p for the error repair rate, i.e., the probability that 
a detected error is successfully repaired. This relaxes the “perfect repair” assumption 
of Jelinski-Moranda; however, the assumption that no new bugs are introduced during 
repair still holds. Obviously, for p = 1, the Jelinski-Moranda model appears as special 
case of the Goel-Okumoto model. 

1 model jm as ’(Basic) Jelinski-Moranda model’. 

2 

3 const nat loc as ’lines OF CODE’. 

4 where n < loc . 

5 

6 double n .error as ’initial number OF errors’. 

7 where n =< n .error . 

8 where n .error =< loc . 

9 

10 double c as ’fault detection rate’. 

11 where 0 < c . 

12 

13 % Observation and distribution 
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14 const nat n as ’number of observations’. 

15 where 0 < n . 

16 

17 data double x(0..n— 1) as ’interfailure times (elapsed time between 
errors) ’ . 

is where 0 < x ( I ) . 

19 

20 x ( I ) ~ exponential ( c * (n.error — I)). 

21 

22 max pr (x | { n.error , c }) for { n_error , c } . 

Listing 8.22: Jelinski-Moranda software reliability model 

1 model ms as ’ Jelinski— Moranda model with conjugate priors’. 

2 

3 double n.error as ’initial number OF ERRORS’. 

4 where n =< n .error . 

5 where n .error =< loc . 

6 

7 double c as ’fault detection rate’. 

8 where 0 < c . 

9 

io % Priors 

n const double theta as ’PRIOR ON N.ERROR ’ . 

12 where 0 < theta . 

13 

14 n.error ~ poisson ( theta ) . 

15 

16 const double mu as ’prior ON C (SCALE parameter) ’. 

17 const double alpha as ’prior on c (shape parameter) ’. 

is where 0 < mu. 

19 where 0 < alpha . 

20 

21 c ~ gamma(mu, alpha) . 

22 

23 % Observation and distribution 

24 const nat n as ’NUMBER OF observations’. 

25 where 0 < n . 

26 

27 const nat loc as ’lines of code’. 

28 where n < loc . 

29 

30 data double x(0..n— 1) as ’interfailure times (elapsed time between 

errors) ’ . 

31 where 0 < x ( I ) . 

32 

33 x ( I ) ~ exponential ( c * (n.error — I)). 


34 
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35 max pr({x, n.error , c}) for {n.error , c}. 

Listing 8.23: Jelinski-Moranda model with conjugate priors 


1 model go as ’ Goel— Okumoto model’. 

2 

3 double n.error as ’initial number OF ERRORS’. 

4 where n =< n .error . 

5 where n .error =< loc . 

6 

7 double c as ’fault detection rate’. 

8 where 0 < c . 

9 

10 double p as ’error repair rate’. 

11 where 0 =< p. 

12 where p =< 1 . 

13 

14 % Observation and distribution 

15 const nat n as ’NUMBER OF observations’. 

16 where 0 < n . 

17 

is const nat loc as ’lines of code’. 

19 where n < loc . 

20 

21 data double x( 0 ..n— 1 ) as ’interfailure times (elapsed time between 

errors) ’ . 

22 where 0 < x ( I ) . 

23 

24 x(I) ~ exponential ( c * (n.error — p * I)). 

25 

26 max pr (x | { n.error , c , p}) for { n.error , c , p} . 

Listing 8.24: Goel-Okumoto model 
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Name 

Lines of spec 

Lines of C++ 

Listing 

normal 

14 

199 

8.1 

normal_known_variance 

18 

299 

8.2 

normaLpriors 

22 

207 

8.3 

biased_measurements 

19 

162 

8.4 

logmormal 

15 

155 

8.5 

lighthouse 

23 

502 

8.6 

biased_coin 

11 

91 

8.7 

biased_coins 

14 

132 

8.8 

biased_coins_prior 

20 

228 

8.9 

mog 

27 

587 

8.10 

mult_cluster 

29 

676 

8.11 

mgp_mu 

39 

635 

8.12 

mix_beta_gauss 

38 

999 

8.14 

mpca 

46 

589 

8.15 

walk 

17 

168 

8.16 

hinckley 

25 

353 

8.17 

climb .transition 

38 

888 

8.18 

climb _transition_prior 

49 

1023 

8.19 

kalrnan 

30 

285 

8.20 

dgtc 

116 

735 

8.21 

jrn 

21 

1040 

8.22 

ms 

35 

1109 

8.23 

go 

26 

886 

8.24 


Table 8.2: AutoBayes specifications and size of generated code 
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A.l AutoBayes Command Line Flags 

The AutoBayes system is controlled by a (large) number of command-line options. 

A list of these options can be obtained by calling 

autobayes -help 

The following list gives an alphabetical overview of all available command line flags 1 
[-0 number] set optimization level to N (default— 1) 

[-anngen] generate annotations and VCs from C or intermediate code hie (.lang. dump) 
with -infer SP 

[— c] parse C code (with -anngen/vcgen) 

[-certify (array | defuse | init | inuse | norm | symm | wl}] generate policy-specific an- 
notations for certification 

[-check] check intermediate code for wellformedness 
[-codegen] generate code from intermediate code hie 
[-compile] compile and link synthesized code 
[-debug number] set debug level for code instrumentation 
[-designdoc] generate design document 
[-designdoc filename] generate named design document 
[-dir filename] working directory for all output hies 
[-dot] write output for dot to <specname>.dot 
[-dot filename] write output for dot to <hlename> 

[-dump {all}] dump intermediate code and proof obligations at all stages to hies 
<specname> . <stage> .dump 

1 This list is autogenerated by AutoBayes with autobayes -tex -help. 
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[-dump {synt | iopt | inst | lang | prop | ann | lopt}] dump intermediate code at stage 
to file <specname>.<stage>.dump 

[-dump {synt | iopt | inst | lang | prop | ann | lopt} filename] dump intermediate code 
at stage to file <filename> 

[-dump {tptp}] dump final proof obligations to files <taskname>.dump 

[-dump {vc | nvc | lvc | svc}] dump proof obligations at stage to file 
<specname> . <stage> .dump 

[-dump {vc | nvc | lvc | svc} filename] dump proof obligations at stage to file 
<specname> . <stage> .dump 

[-fastest] report which is the fastest synthesized program (see -maxprog) 

[-geninit] DEVELOP: enable generic variable initialization 

[-genopt] DEVELOP: enable generic optimize schema 

[-help] display usage and list of options 

[-help atom] display usage of option 

[-help {pragmas}] display list of available pragmas 

[-html] write synthesized code as htrnl to hie <specname>.html 

[-html filename] write synthesized code as html to hie <hlcname> 

[-html {synt | iopt | inst | lang | prop | ann | lopt} filename] write intermediate code 
at stage as html to hie <hlename> 

[-html {synt | iopt | inst | lang | prop | ann | lopt | all}] write intermediate code at 
stage as html to hie <specname>.<stage>.html 

[-html.in filename] retrieve html from the html hie <hlename> 

[-infer {array | init | norm | symm | val | frame}] infer policy-specihc annotations for 
certification 

[-instrument] instrument code with conv-vector 
[-interactive] switch into interactive mode 

[- j s filename] write synthesized code as JavaScript to hie <hlename> 

[- j s {synt | iopt | inst | lang | prop | ann | lopt} filename] write intermediate code 
at stage as JavaScript to hie <hlename> 
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[- j s {synt | iopt | inst | lang | prop | ann | lopt | all}] write intermediate code at stage 
as JavaScript to file <specname>.<stage> Jitml 

[-lib (gsl | gslran | unuran}] enable library 

[-lib (gsl | gslran | unuran} filename] enable library (with include-path) 

[-list {all}] list intermediate code and proof obligations at all stages to files 
<specname>.<stage> Jist 

[-list {all} -] list intermediate code and proof obligations at all stages to stdout 

[-list {synt | iopt | inst | lang | prop | ann | lopt}] list intermediate code at stage to 
file <specname>.<stage>.txt 

[-list {synt | iopt | inst | lang | prop | ann | lopt} filename] list intermediate code 
at stage to file <filename> 

[-list {vc | nvc | lvc I svc}] list proof obligations at stage to file <specname>.<stage>.txt 
[-list {vc | nvc | lvc | svc} filename] list proof obligations at stage to file <filename> 
[-log] write information to log file <specname> Jog 
[-log filename] write information to log file <filename> 

[-matlab {ann}] (stage 2: ann) write intermediate code as Matlab-readable JavaScrip- 
t/HTML to file <specname>_<policy> Jang.html, with main hie 
<specname>. certihcation.html 

[-matlab {lang}] (stage 1: lang) write synthesized code as Matlab-readable JavaScrip- 

t/HTML to hie <specname>. lang.html, where main hie <specname>. certification Jitml 

[-matlab {proofs}] (stage 4: proofs) Generates Stage(l-3). Executes run_all prover 
on VCs and return result in Matlab-readable JavaScript /HTML, where main hie 
is <specname>_<policy>. certification Jitml 

[-matlab {tasks}] (stage 3: tasks) write intermediate code as Matlab-readable 
JavaScript/HTML, where main hie <specname>_<policy>. certihcation.html 

[-maxprog number] synthesize up to N program versions (default=l) 

[-monitorapproximations] synthesize code to check that approximation error bounds 
are respected 

[-monte.carlo] synthesize monte-carlo data for Kalman hlters [Autohlter only] 

[-nocode] synthesize intermediate code only 
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[-no optimize] set optimization level to 0 (no optimization) 

[-php filename] write synthesized code as php to file <filename> 

[-php {synt | iopt | inst | lang | prop | ann | lopt} filename] write intermediate code 
at stage as php to hie <hlename> 

[-php {synt | iopt | inst | lang | prop | ann | lopt | all}] write intermediate code at stage 
as php to hie <specname>.<stage>.php 

[-pragma atom=atom] set pragma to value 

[-prover {tptpl esetheo | ivy}] use specihed prover for certification [inactive] 
[-quiet] reduce log/trace information 
[-sample] synthesize code for data sampling 
[-silent] suppress all log/trace information 

[-target {c_standalone | matlab | modula2 | octave | spark}] target language and run- 
time environment 

[-tex] write synthesized code as latex to hie <specname>.tex 

[-tex filename] write synthesized code as latex to hie <hlcname> 

[-tex {synt | iopt | inst | lang | prop | ann | lopt} filename] write intermediate code 
at stage as latex to hie <hlename> 

[-tex {synt | iopt | inst | lang | prop | ann | lopt | all}] write intermediate code at stage 
as latex to hie <specname>.<stage>.tex 

[-timelimit number] set overall timelimit 

[-timelimit number number number] set timelimits for simplifier, solver, optimizer 
[-user filename] user information [only for web autobayes] 

[-vcgen] generate VCs from C or annotated intermediate code hie (.ann. dump / 
.prop. dump) with -infer SP or -certify SP, resp. 

A. 2 AutoBayes Pragmas 

AutoBayes pragmas are low-level hags and commands to control specihc actions 
in the AutoBayes system. Their main purpose is to help the developer and and- 
vanced user to guide the AutoBayes system in a specihc way. A complete list of 
AutoBayes pragmas can be obtained by 
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$ autobayes -help pragmas 
In general, a pragma has the form 
-pragma <NAME>=<VALUE> 

No spaces are allowed between the name of the pragma, the equality sign , and 
the value. 

In the following we list all AutoBayes pragmas 2 . For each pragma, its name and 
type is given. Default values and, where applicable, a list of possible values are shown. 
AutoBayes supports the following types of pragmas: 

boolean are boolean flags, which can take the values true or false 

integer can take arbitrary integer values 

atom can take a Prolog atomic value, i.e., a name like none, or a string in single 
quotes, e.g., ’this model’. If a list of possible values is given, only arguments 
matching with an element of this list can be used. 

callable requires the name of a predicate. This feature enables the developer to 
call specific predicates in conjunction with this predicate. The arity of such a 
predicate depends on it actual calling environment. No checks whatsoever are 
performed. 

assert_display_location (boolean) Display source code locations in trace mes- 
sages 

Default: -pragma assert_display_location=true 

certify.browser (atomic) set to open a browser (or browser tab) after execution 
of code 

Default: -pragma certify_browser=none 
Possible values : 
none no browser 
firefox open firefox browser 
mozilla open mozilla browser 
safari open safair browser 
netscape open netscape browser 
This list is autogenerated by the command autobayes -tex -help pragmas. 


2 
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certify_explicit_symm (boolean) use symm predicate in annotations 

Default: -pragma certify_explicit_symm=f alse 

certify_external_init (boolean) Treat external declarations as being initialized 

Default: -pragma certify_external_init=f alse 

certify_f ilename.policy (boolean) use safety policy in filename 

Default: -pragma certify_f ilename_policy=true 

certify_generate_proof .tasks (boolean) Generate certification proof tasks: gets 
set to true if -certify or -infer called 

Default: -pragma certify_generate_proof _tasks=f alse 

certify_generate_saf ety.doc (boolean) generate rendered safety document from 
inference and VC information 

Default: -pragma certify_generate_saf ety_doc=f alse 

certify_generate_true_prooftask (boolean) Generate at least one proof task 

Default: -pragma certify_generate_true_prooftask=true 

certify_globals_strict (boolean) Globals must be initialized according to decl 
lists in procs and funcs 

Default: -pragma certify_globals_strict=f alse 
certify_globals_visible (boolean) Globals are visible throughout 
Default: -pragma certify_globals_visible=true 
certify_hotvar (atomic) Active hotvar for annotation inference 
Default: -pragma certify_hotvar=$all 
Possible values : 

$all Infer annotations for all hotvars 

_ Infer annotations for specified hotvar only 

certify_itar_warning (boolean) insert ITAR warning at top of safety report 

Default: -pragma certify_itar_warning=f alse 

certify_label_stage (atomic) The stage at which line numbers are added during 
annotation inference 
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Default: -pragma certify_label_stage=lang 

Possible values : 

source Assume to already exist in parsed source 
lang Add before inference 
ann Add after inference 

certify_limit_def s (boolean) Limit the def patterns to be generator specific 

Default: -pragma certify_limit_def s=true 

certify_list_def s (boolean) List which defs have been successfully annotated 

Default: -pragma certify_list_def s=true 

certify_only_def s (boolean) Only annotate the defs 

Default: -pragma certify_only_def s=f alse 

certify_ordered_anns (boolean) process pre- and post-conditions in order 

Default: -pragma certify_ordered_anns=f alse 

certify_render_lterm (boolean) display lterm with rendered VCs 

Default: -pragma certify_render_lterm=f alse 

certify_semantic_labels (boolean) add semantic markup to VCs and display ex- 
planations in certification browser 

Default: -pragma certify_semantic_labels=f alse 

certif y_semantic_labels_order (callable) sort order for display of semantic markup 
in VCs 

Default: -pragma certify_semantic_labels_order=render_sort_labels_by_line 

Possible values : 

render_sort_labels_by_line 

certify_stream_vcs (boolean) Output VCs as they are generated 
Default: -pragma certify_stream_vcs=f alse 
certify_transparent_anns (boolean) use transparent annotation rules in VCG 
Default: -pragma certify_transparent_anns=f alse 



126 


Command Line Options 


certify_transparent_inf (boolean) only annotate opaque barriers during infer- 
ence 

Default: -pragma certify_transparent_inf=f alse 
certify use (atomic) Active use number for annotation inference 
Default: -pragma certify_use=$all 

Possible values : 

$all Infer annotations for all uses of a given hotvar 
_ Infer annotations for a specified use of a given hotvar 
certify_use_postconditions (boolean) use postconditions in VCG 
Default: -pragma certify_use_postconditions=true 
certify_vc_label (callable) Pre-simplifier for proof tasks (labeling and splitting) 
Default: -pragma certify_vc_label=vc_label 
Possible values : 

vc_label strongest simplification (default) 
vc_label_structure_prop 
vc_label_structure 
vc_identity no simplification 

certify_vc_normalize (callable) Normalizer for proof tasks (integrated into VCG) 
Default: -pragma certify_vc_normalize=vc_normalize 

Possible values : 

vc_normalize default normalization 
vc_normalize_auxiliary 
vc_normalize_f list 
vc_normalize_subst 
vc_identity no normalization 

certif y_vc_simplif y (callable) Simplifier for proof tasks (after conversion into FOL) 
Default: -pragma certify_vc_simplify=vc_simplify 
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Possible values : 

vc_simplify strongest simplification (default) 
vc_identity no simplification 

cg_comment_style (atomic) select comment style for C/C++ code generator 
Default: -pragma cg_comment_style=cpp 
Possible values : 

kr use traditional (KR) style comments 
cpp use C++ style comments / / 

cluster.pref (atomic) select algorithm schemas for hidden- variable (clustering) prob- 
lems 

Default: -pragma cluster_pref =em 

Possible values : 
em prefer EM algorithm 
no_pref no preference 
k_means use k-means algorithm 

codegen_ignore_inconsistent_term (boolean) [DEBUG] ignore inconsistent-term 
conditional expressions in codegen 

Default: -pragma codegen_ignore_inconsistent_term=f alse 
em (atomic) preference for initialization algorithm for EM 
Default: -pragma em=no_pref 
Possible values : 
no_pref no preference 
center center initialization 
sharp.class class-based initialization (sharp) 
fuzzy.class class-based initialization (fuzzy) 
em_log_likelihood_convergence (boolean) converge on log-likelihood-function 
Default: -pragma em_log_likelihood_convergence=f alse 
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era_q_output (boolean) Output the Q matrix of the EM algorithm 
Default: -pragma em_q_output=f alse 

em_q_update_simple (boolean) force the q-update to just contain the density func- 
tion 

Default: -pragma em_q_update_simple=f alse 

ignore_division_by_zero (boolean) DEBUG: Do not check for X=0 in X**(-l) 
expressions 

Default: -pragma ignore_division_by_zero=f alse 

ignore_zero_base (boolean) DEBUG: Do not check for zero-base in X**Y expres- 
sions 

Default: -pragma ignore_zero_base=f alse 
il_extended (boolean) use extended intermediate language. Set with -c 
Default: -pragma il_extended=f alse 

inf ile_cpp_pref ix (atomic) Prefix for intermediate input hie after cpp(l) process- 
ing 

Default: -pragma inf ile_cpp_pref ix=cpp_ 

instrument_convergence_save_ub (integer) default size of instrumentation vector 
for convergence loops 

Default: -pragma instrument_convergence_save_ub=999 
lopt (boolean) Turn on/off optimization of the lang code 
Default: -pragma lopt=false 

optimize_cse (boolean) enable common subexpression elimination 
Default: -pragma optimize_cse=true 

optimize_expression_inlining (boolean) enable inlining (instead function calls) 
of goal expressions by schemas 

Default: -pragma optimize_expression_inlining=true 

optimize_max_unrolling_depth (int) maximal depth of for- loops w/ constant bound 
to be unrolled 

Default: -pragma optimizeunax_unrolling_depth=3 
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optimize_meraoization (boolean) enable subexpression-mcmoization 

Default: -pragma optimize_memoization=true 

optimize_substitute_constants (boolean) allow values of constants to be substi- 
tuted into loop bounds 

Default: -pragma optimize_substitute_constants=true 
pp_html_f ixed_f ont_f amily (atomic) Font family for fixed fonts in html-output 
Default: -pragma pp_html_f ixed_f ont_f amily=courier new 
Possible values : 
courier new default font 
other fonts possible 

pp_html_f ixed_f ont.size (int) Font size for fixed fonts in html-output 
Default: -pragma pp_html_f ixed_f ont_size=14 
ppJitml_f ont_color_active (atomic) Font color for active in html-output 
Default: -pragma pp_html_f ont_color_active=#6699FF 
Possible values : 

#6699FF default color 
other colors possible 

ppJitml_f ont_color_annotation (atomic) Font color for annotation in html-output 
Default: -pragma pp_html_f ont_color_annotation=purple 

Possible values : 
purple default color 
other colors possible 

ppJitml_f ont_color_comment (atomic) Font color for comment in html-output 
Default: -pragma pp_html_f ont_color_comment=green 

Possible values : 
green default color 
other colors possible 
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pp_html_f ont_color_highlight (atomic) Font color for highlight in html-output 
Default: -pragma pp_html_f ont_color_highlight=red 

Possible values : 
red default color 
other colors possible 

pp_html_f ont_color_hover (atomic) Font color for hover in html-output 
Default: -pragma pp_html_f ont_color_hover=#6699FF 
Possible values : 

#6699FF default color 
other colors possible 

pp_html_f ont_color_label (atomic) Font color for label in html-output 
Default: -pragma ppJitml_f ont_color_label=blue 

Possible values : 
blue default color 
other colors possible 

pp_html_f ont_color_label_ref (atomic) Font color for label in html-output 
Default: -pragma ppJitml_f ont_color_label_ref =#FF9966 

Possible values : 

#FF9966 default color 
other colors possible 

pp_html_f ont_color_link (atomic) Font color for link in html-output 
Default: -pragma ppJitml_f ont_color_link=#6699FF 

Possible values : 

#6699FF default color 
other colors possible 

pp_html_f ont_color_schema (atomic) Set font colors for html-output 
Default: -pragma ppJitml_f ont_color_schema=def ault 
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Possible values : 

default 

bw 

ppJitml_f ont_color_visited (atomic) Font color for visited in html-output 
Default: -pragma pp_html_f ont_color_visited=#6699FF 

Possible values : 

#6699FF default color 
other colors possible 

pragmas _detailed_help (boolean) Print detailed information on Pragmas in -help 
pragmas 

Default: -pragma pragmas_detailed_help=true 
prolog.style (boolean) Capitalized names are variables 
Default: -pragma prolog_style=true 

propagate.annotations (atomic) propagate explicit annotations during certifica- 
tion 

Default: -pragma propagate_annotations=true 

Possible values : 

true Propagate annotations after lang stage 
false Do no propagation 
inf er_ann_pre Propagate before inference 
inf er_ann_post Propagate after inference 
propagate_index_bounds (boolean) propagate index bounds during certification 
Default: -pragma propagate_index_bounds=true 
rwr_cache_max (integer) size of rewrite cache 
Default: -pragma rwr_cache_max=2048 

schema_control_arbitrary_init_values (boolean) enable initialization of goal vari- 
ables w/ arbitrary start /step values 

Default: -pragma schema_control_arbitrary_init_values=f alse 
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schema_control_init_values (atomic) initialization of goal variables 
Default: -pragma schema_control_init_values=automatic 

Possible values : 

automatic calculate best values 
arbitrary use arbitrary values 

user user provides values (additional input parameters 

schema_control_solve_partial (boolean) enable partial symbolic solutions 

Default: -pragma schema_control_solve_partial=true 

schema_control_use_generic_optimize (boolean) enable intermediate code gener- 
ation w/ generic optimize(...)-statements 

Default: -pragma schema_control_use_generic_optimize=f alse 

synth_serialize_maxvars (integer) maximal number of solved variables eliminated 
by serialize 

Default: -pragma synth_serializeunaxvars=0 
system.os (atomic) generate html for target os 
Default: -pragma system_os=linux 
Possible values : 
linux 
windows 

trace_browser_f iles (boolean) Trace I/O of certification browser files 

Default: -pragma trace_browser_f iles=f alse 

trace_display_solver_obligations (boolean) display proof obligations from the 
solver 

Default: -pragma trace_display_solver_obligations=true 
trace_vc_f iles (boolean) Trace I/O of VC files 
Default: -pragma trace_vc_f iles=true 
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and validation rather than manual development of solution algorithms and code. 
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